Running LLMs Locally – Mayank Goyal

There’s a lot of regular news about new LLMs being released, with Gemma4 being the latest.

A lot of people get stuck on the first step of how to run them locally – which is quite straightforward thanks to all the tooling available these days.

1. Install WSL

Since you’re likely on Windows 11, you’ll need to install WSL to work with many AI Libraries and tools, like vLLM. It’s not necessary for Ollama (which we’ll use in this post), but still a good habit to get started with running things on Linux.

https://learn.microsoft.com/en-us/windows/wsl/install

2. Install Python

Inside WSL –

curl https://pyenv.run | bash
pyenv install 3.12.7
pyenv global 3.12.7

mkdir -p ~/aiupskilling && cd ~/aiupskilling
python -m venv .venv
source .venv/bin/activate

pip install requests

3. Install Ollama

Inside WSL –

curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4

4. Run the Model

Inside WSL –

ollama run gemma4

Or if you want to integrate with it programatically, run python REPL via the “python” command, and run the below –

import requests
r = requests.post("http://localhost:11434/api/generate",
    json={"model": "gemma4", "prompt": "What is Kafka in 2 sentences?", "stream": False})
print(r.json()["response"])

That’s it. As simple as that.

Additional menu

1. Install WSL

2. Install Python

3. Install Ollama

4. Run the Model