Embeddings, Vector Stores & RAGLesson 6 of 11

PostgreSQL Hybrid Search

After Hybrid Search, swap Chroma.from_texts for PGVector. The BM25Retriever and EnsembleRetriever code stays as-is.

Before you run

Venv from Project Setup. Reuse POSTGRES_URL from PostgreSQL Chat Message History. You need pgvector on the server — the image below includes it:

docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 pgvector/pgvector:pg16

Then install:

pip install langchain-postgres psycopg[binary] rank_bm25

What changes from Chroma:

POSTGRES_URL → postgresql+psycopg://…

↓

PGVector.add_documents(…)

↓

same BM25 + EnsembleRetriever as before

BM25 still runs on the in-memory text list. Only the vector half moves into PostgreSQL.

Swap Chroma for PGVector

PGVector wants a postgresql+psycopg:// connection string. The demo rewrites POSTGRES_URL with a one-line .replace(). Load chunks with add_documents, not from_texts.

import os
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_postgres import PGVector

postgres_url = os.getenv(
    "POSTGRES_URL",
    "postgresql://postgres:postgres@localhost:5432/chatdb",
)
connection = postgres_url.replace("postgresql://", "postgresql+psycopg://", 1)

texts = [
    "The <a> tag creates a hyperlink. Set the href attribute to the URL.",
    "The <title> tag sets the browser tab title.",
    "The <h1> tag marks the main heading on a page.",
]

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="hybrid_search_demo",
    connection=connection,
    use_jsonb=True,
)
vectorstore.add_documents([Document(page_content=t) for t in texts])

vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

bm25_retriever = BM25Retriever.from_texts(texts)
bm25_retriever.k = 2

hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5],
)

docs = hybrid_retriever.invoke("href attribute URL")
for doc in docs:
    print(doc.page_content)

pre_delete_collection=True in the demo script clears the collection on each run so you start fresh.

Run the demo

postgresql_hybrid_search_demo.py

Same query as hybrid_search_demo.py

Download ZIP .py only

Needs POSTGRES_URL and OPENAI_API_KEY in .env.

◇postgresql_hybrid_search_demo.py

"""postgresql_hybrid_search_demo.py"""

from langchain_postgres import PGVector

# replace Chroma.from_texts with PGVector + add_documents

hybrid_retriever = EnsembleRetriever(…)

python postgresql_hybrid_search_demo.py

PowerShell — (.venv) active

(.venv) PS C:\projects\langchain-course> python postgresql_hybrid_search_demo.py

Query: href attribute URL

=== Vector-only (PGVector) ===

[0] The <title> tag sets the browser tab title.

[1] The <h1> tag marks the main heading on a page.

=== Hybrid (BM25 + PGVector) ===

[0] The <a> tag creates a hyperlink. Set the href attribute…

[1] The <title> tag sets the browser tab title.

Same printed output as the Chroma demo — the <a> line only wins after BM25 joins in.

If it fails

extension "vector" is not available — use pgvector/pgvector, not plain postgres.
Connection refused — check POSTGRES_URL and that the container is running.
No module named langchain_postgres — run the install line above.

BM25 errors: see Hybrid Search. PGVector docs: langchain-postgres.

What's Next

Same hybrid setup, vectors in Postgres. Next: Indexing API.

← PREVIOUS

Hybrid Search

Indexing API