Open Source LLM Setup
The last lesson wired up OpenAI in the cloud. Here we set up a model that runs on your own PC — useful when you want zero API cost or to work offline. We will use Ollama, which we briefly met in Lesson 3.
Cloud API vs local model
Open source models like Llama publish their weights. You download the files and run inference on your hardware. Nothing is sent to OpenAI's servers, and there is no per-message bill — but you need enough RAM and a patient CPU or GPU.
LangChain can talk to both setups. OpenAI uses an API key; a local model uses a URL on your machine (Ollama defaults to localhost:11434). We will connect LangChain to each path in later lessons.
Install Ollama
Ollama is a small desktop app that downloads models and exposes a local API. On Windows, open ollama.com/download, select Windows, then paste the PowerShell command or click Download for Windows.
Download Ollama
macOS
Linux
Windows
paste this in PowerShell
or
Requires Windows 10 or later
Pick a model
Open ollama.com/library to browse what is available. You will see llama3.1, deepseek-r1, mistral, gemma3, and others. Each card lists size tags — 3b or 8b fits most laptops; 70b needs a strong GPU. For this course, pull llama3.2.
Library
llama3.1
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.
deepseek-r1
DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.
llama3.2
Meta's Llama 3.2 goes small on purpose — 1B and 3B sizes meant for on-device and laptop use.
mistral
The 7B model from Mistral AI — an early open-weight option that still shows up in tutorials and comparisons.
gemma3
Gemma 3 builds on Google's open Gemma line with stronger multilingual and vision variants.
qwen2.5
Qwen 2.5 from Alibaba — strong at coding and multilingual tasks across several parameter sizes.
An example
Open PowerShell or Terminal in your project folder. Pull the model once, then start a chat session. Ask the same kind of HTML question we used in the LangChain overview — if you get a sensible answer, the local stack is working.
/bye to exit the chat.When to use which
Stick with OpenAI for the main course exercises — examples online assume it, and responses are faster on a modest PC. Keep Ollama installed as a fallback for offline practice or when you hit API rate limits.
Leave Ollama running in the background while you code. The next lesson sets up the Python project folder, virtual environment, and LangChain packages.
What's Next
Local model is ready. Next: create the project folder, venv, and requirements.txt.