Embeddings
Text Splitting leaves you with short text chunks. Here each chunk goes through OpenAIEmbeddings and comes back as a list of floats. You compare those lists to see which chunk lines up with a question you type in.
Before you run
Activate the venv from Project Setup.langchain-openai is already installed. Your .env file needs OPENAI_API_KEY from OpenAI Account Setup.
One sentence → one vector
What is an embedding?
An embedding is a fixed-length list of floats returned by the API. You never inspect all 1536 values — you run cosine similarity between two lists and read the single score it prints.
Steps in the demo:
embed_query.Create embeddings
OpenAIEmbeddings calls the OpenAI API.embed_documents takes a list of strings (your chunks) and returns one vector per string.
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
load_dotenv()
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
texts = [
"The <a> tag creates a hyperlink.",
"The <title> tag sets the browser tab title.",
]
vectors = embeddings.embed_documents(texts)
print(len(vectors), "vectors")
print(len(vectors[0]), "numbers per vector")embed_query takes one string — the line you want to match against. Same model, separate method name in LangChain.
query = "How do I make a link on a page?" query_vec = embeddings.embed_query(query) print(len(query_vec), "numbers in query vector")
OpenAI embedding models
Set the model in the constructor. Each row below lists how many floats you get back per string.
| OpenAI model | Dimensions |
|---|---|
| text-embedding-3-small | 1536 |
| text-embedding-3-large | 3072 |
| text-embedding-ada-002 | 1536 |
model= when you build OpenAIEmbeddings. Other providers (Hugging Face, Cohere, etc.) use their own class names.API pricing: OpenAI embeddings guide.
Cosine similarity
Cosine similarity gives you one float between 0 and 1 per pair. Embed your question, score it against each stored chunk, sort the list, and print from the top down.
Cosine similarity — score from 0 to 1:
Bigger number = closer match.
def cosine_similarity(a, b):
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(y * y for y in b) ** 0.5
return dot / (norm_a * norm_b)
scores = [
(cosine_similarity(query_vec, vec), text)
for vec, text in zip(vectors, texts)
]
scores.sort(reverse=True)
for score, text in scores:
print(f"[{score:.4f}] {text}")Run the demo
Download the script, place it in your project folder, then run:
embeddings_demo.py
Sample texts built into the file
OPENAI_API_KEY in .env from OpenAI Account Setup.python embeddings_demo.py
If it fails
- AuthenticationError — check
OPENAI_API_KEYin.envand callload_dotenv()before creatingOpenAIEmbeddings. - Rate limit / billing — confirm credits on your OpenAI account.
- Unexpected order — the demo only has three short lines. Swap in chunks from your own split file to test further.
Other providers: LangChain embedding models docs.
What's Next
Embeddings are working. Next up: Vector Databases.