Embeddings, Vector Stores & RAGLesson 8 of 11

PostgreSQL Indexing API

After Indexing API, point SQLRecordManager and index() at PostgreSQL instead of SQLite + Chroma. The three-run demo and the cleanup="incremental" call stay the same.

Before you run

Venv from Project Setup. Reuse POSTGRES_URL from PostgreSQL Chat Message History. pgvector image:

docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 pgvector/pgvector:pg16

Then install:

pip install langchain-postgres psycopg[binary] sqlalchemy

What changes from Chroma:

sqlite:///indexing_record_manager.sql

↓

postgresql+psycopg://… (same POSTGRES_URL)

↓

Chroma folder → PGVector collection

Record hashes land in upsertion_record. Vectors sit in PGVector tables — no local files.

Swap Chroma for PGVector

Pass postgresql+psycopg://… to both SQLRecordManager and PGVector. Each Document still needs metadata["source"].

import os
from langchain_community.indexes import SQLRecordManager
from langchain_core.documents import Document
from langchain_core.indexing import index
from langchain_openai import OpenAIEmbeddings
from langchain_postgres import PGVector

postgres_url = os.getenv(
    "POSTGRES_URL",
    "postgresql://postgres:postgres@localhost:5432/chatdb",
)
connection = postgres_url.replace("postgresql://", "postgresql+psycopg://", 1)

docs = [
    Document(
        page_content="The <a> tag creates a hyperlink.",
        metadata={"source": "html_notes.txt"},
    ),
    Document(
        page_content="The <title> tag sets the browser tab title.",
        metadata={"source": "html_notes.txt"},
    ),
]

record_manager = SQLRecordManager("indexing_demo", db_url=connection)
record_manager.create_schema()

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="indexing_demo",
    connection=connection,
    use_jsonb=True,
)

result = index(
    docs,
    record_manager,
    vectorstore,
    cleanup="incremental",
    source_id_key="source",
)

print(result)

The demo script clears upsertion_record rows and calls pre_delete_collection=True at the top — same idea as deleting the SQLite file and Chroma folder in the Chroma demo.

Run the demo

postgresql_indexing_api_demo.py

Same three index() calls as indexing_api_demo.py

Download ZIP .py only

Needs POSTGRES_URL and OPENAI_API_KEY in .env.

◇postgresql_indexing_api_demo.py

"""postgresql_indexing_api_demo.py"""

from langchain_core.indexing import index

# DELETE upsertion_record rows + pre_delete_collection at start

result = index(docs, record_manager, vectorstore, …)

python postgresql_indexing_api_demo.py

PowerShell — (.venv) active

(.venv) PS C:\projects\langchain-course> python postgresql_indexing_api_demo.py

=== First run (2 docs) ===

added: 2

skipped: 0

=== Second run (same 2 docs) ===

added: 0

skipped: 2

=== Third run (1 doc removed) ===

skipped: 1

deleted: 1

Same counts as indexing_api_demo.py — second run skips, third run deletes the <title> row.

If it fails

extension "vector" is not available — use pgvector/pgvector, not plain postgres.
relation "upsertion_record" does not exist — call record_manager.create_schema() first.
Vectorstore has not implemented the delete method — import PGVector from langchain_postgres, not Chroma.

Other index() errors: see Indexing API. PGVector docs: langchain-postgres.

What's Next

Indexing works in Postgres. Next: RAG.

← PREVIOUS

Indexing API

Retrieval-Augmented Generation (RAG)