Course navigation
Embeddings, Vector Stores & RAGLesson 3 of 7

Embeddings

Text Splitting leaves you with short text chunks. Here each chunk goes through OpenAIEmbeddings and comes back as a list of floats. You compare those lists to see which chunk lines up with a question you type in.

Before you run

Activate the venv from Project Setup.langchain-openai is already installed. Your .env file needs OPENAI_API_KEY from OpenAI Account Setup.

One sentence → one vector

The <a> tag creates a hyperlink.
↓ embed
[0.021, -0.114, 0.038, … 1536 numbers total]
One string in, one list of numbers out. Every vector from the same model has the same length.

What is an embedding?

An embedding is a fixed-length list of floats returned by the API. You never inspect all 1536 values — you run cosine similarity between two lists and read the single score it prints.

Steps in the demo:

3 short HTML sentences (your chunks)
↓ embed_documents()
3 vectors stored in a Python list
↓ embed_query() on one line
cosine similarity → print highest score first
The demo uses one model for the stored lines and the line you pass to embed_query.

Create embeddings

OpenAIEmbeddings calls the OpenAI API.embed_documents takes a list of strings (your chunks) and returns one vector per string.

from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings

load_dotenv()

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "The <a> tag creates a hyperlink.",
    "The <title> tag sets the browser tab title.",
]

vectors = embeddings.embed_documents(texts)
print(len(vectors), "vectors")
print(len(vectors[0]), "numbers per vector")

embed_query takes one string — the line you want to match against. Same model, separate method name in LangChain.

query = "How do I make a link on a page?"

query_vec = embeddings.embed_query(query)
print(len(query_vec), "numbers in query vector")

OpenAI embedding models

Set the model in the constructor. Each row below lists how many floats you get back per string.

OpenAI modelDimensions
text-embedding-3-small1536
text-embedding-3-large3072
text-embedding-ada-0021536
Pass model= when you build OpenAIEmbeddings. Other providers (Hugging Face, Cohere, etc.) use their own class names.

API pricing: OpenAI embeddings guide.

Cosine similarity

Cosine similarity gives you one float between 0 and 1 per pair. Embed your question, score it against each stored chunk, sort the list, and print from the top down.

Cosine similarity — score from 0 to 1:

0.84
The <a> tag creates a hyperlink.
0.45
The <title> tag sets the browser tab title.
0.39
The <h1> tag marks the main heading.

Bigger number = closer match.

Your printed order should match this. The float values themselves change between API calls.
def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = sum(x * x for x in a) ** 0.5
    norm_b = sum(y * y for y in b) ** 0.5
    return dot / (norm_a * norm_b)

scores = [
    (cosine_similarity(query_vec, vec), text)
    for vec, text in zip(vectors, texts)
]
scores.sort(reverse=True)

for score, text in scores:
    print(f"[{score:.4f}] {text}")

Run the demo

Download the script, place it in your project folder, then run:

embeddings_demo.py

Sample texts built into the file

Needs a valid OPENAI_API_KEY in .env from OpenAI Account Setup.
embeddings_demo.py
"""embeddings_demo.py"""
from langchain_openai import OpenAIEmbeddings
# embed_documents → embed_query → cosine similarity
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
python embeddings_demo.py
PowerShell — (.venv) active
(.venv) PS C:\projects\langchain-course> python embeddings_demo.py
embed_documents: 3 vectors, length 1536 each
first vector (first 5 values): [0.021, -0.114, 0.038, 0.052, -0.009]
embed_query: length 1536
Query: How do I make a link on a page?
[0.8421] The <a> tag creates a hyperlink.
[0.4512] The <title> tag sets the browser tab title.
[0.3890] The <h1> tag marks the main heading.
Order should stay the same — the <a> line on top. Float values will not match exactly every run.

If it fails

  • AuthenticationError — check OPENAI_API_KEY in .env and call load_dotenv() before creating OpenAIEmbeddings.
  • Rate limit / billing — confirm credits on your OpenAI account.
  • Unexpected order — the demo only has three short lines. Swap in chunks from your own split file to test further.

Other providers: LangChain embedding models docs.

What's Next

Embeddings are working. Next up: Vector Databases.