Course navigation
Embeddings, Vector Stores & RAGLesson 4 of 7

Vector Databases

Embeddings left the float lists in a Python variable. A vector store writes them to a folder and hands back matching lines when you call similarity_search. We use local Chroma in the demo.

Before you run

Activate the venv from Project Setup. Install Chroma's database package:

pip install chromadb

Keep OPENAI_API_KEY in .env from OpenAI Account Setup . from_texts calls OpenAI to build the vectors.

Demo flow:

3 HTML lines (same strings as Embeddings demo)
↓ Chroma.from_texts()
chroma_demo_db/ folder on disk
↓ similarity_search_with_score()
print lines + distance score
The last lesson scored lists in Python. Here the numbers sit in a folder on disk instead.

Vector stores in LangChain

A vector store pairs each text line with its float list. You load the lines once, then pass a query string to pull the best matches. Chroma, FAISS, Pinecone, and Postgres all expose the same LangChain methods.

StoreRuns on
ChromaLocal folder
FAISSLocal file
PineconeCloud
PostgreSQLYour DB server
Each one uses from_texts to load data and similarity_search to read it back.

Store chunks with Chroma

Chroma.from_texts embeds each string and writes a chroma_demo_db folder next to your script. Everything stays on your PC.

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "The <a> tag creates a hyperlink.",
    "The <title> tag sets the browser tab title.",
]

vectorstore = Chroma.from_texts(
    texts=texts,
    embedding=embeddings,
    persist_directory="./chroma_demo_db",
)

After first run

langchain-course/
vector_databases_demo.py
chroma_demo_db/
chroma.sqlite3 …
The demo deletes chroma_demo_db at the start so each run starts clean. Drop that line when you want data to persist between runs.

Read matching lines

similarity_search returns documents only. similarity_search_with_score adds a distance float per row — lower number means a closer match in Chroma.

query = "How do I make a link on a page?"

results = vectorstore.similarity_search_with_score(query, k=2)

for doc, score in results:
    print(f"[{score:.4f}] {doc.page_content}")

FAISS, Pinecone, PostgreSQL

FAISS — install faiss-cpu, build with FAISS.from_texts, save with save_local. Stays on one machine.

Pinecone — runs on Pinecone's servers. Set PINECONE_API_KEY and your index name on PineconeVectorStore.

PostgreSQL — turn on the pgvector extension, then PGVector from langchain-postgres (installed in PostgreSQL Chat Message History) if you already keep app data in Postgres.

Chroma docs: docs.trychroma.com.

Run the demo

Download the script, unzip if needed, then run:

vector_databases_demo.py

Creates chroma_demo_db/ on first run

Needs OPENAI_API_KEY in .envfrom_texts calls OpenAI before anything is written to Chroma.
vector_databases_demo.py
"""vector_databases_demo.py"""
from langchain_community.vectorstores import Chroma
# from_texts → similarity_search_with_score
vectorstore = Chroma.from_texts(…)
python vector_databases_demo.py
PowerShell — (.venv) active
(.venv) PS C:\projects\langchain-course> python vector_databases_demo.py
Storing chunks in Chroma...
Saved to C:\projects\langchain-course\chroma_demo_db
Query: How do I make a link on a page?
--- result 0 (score 0.2187) ---
The <a> tag creates a hyperlink.
--- result 1 (score 0.8904) ---
The <title> tag sets the browser tab title.
Lower score means closer match in Chroma's distance metric. The <a> line should still rank first.

If it fails

  • ModuleNotFoundError: chromadb — run pip install chromadb.
  • AuthenticationError — check OPENAI_API_KEY in .env.
  • Permission error on chroma_demo_db — close other programs using the folder, or delete chroma_demo_db by hand.

More detail: LangChain vector stores docs.

What's Next

Chroma store is working. Next up: Retrieval-Augmented Generation (RAG).