There’s a lot of regular news about new LLMs being released, with Gemma4 being the latest.
A lot of people get stuck on the first step of how to run them locally – which is quite straightforward thanks to all the tooling available these days.
1. Install WSL
Since you’re likely on Windows 11, you’ll need to install WSL to work with many AI Libraries and tools, like vLLM. It’s not necessary for Ollama (which we’ll use in this post), but still a good habit to get started with running things on Linux.
https://learn.microsoft.com/en-us/windows/wsl/install
2. Install Python
Inside WSL –
curl https://pyenv.run | bash
pyenv install 3.12.7
pyenv global 3.12.7
mkdir -p ~/aiupskilling && cd ~/aiupskilling
python -m venv .venv
source .venv/bin/activate
pip install requests
3. Install Ollama
Inside WSL –
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4
4. Run the Model
Inside WSL –
ollama run gemma4
Or if you want to integrate with it programatically, run python REPL via the “python” command, and run the below –
import requests
r = requests.post("http://localhost:11434/api/generate",
json={"model": "gemma4", "prompt": "What is Kafka in 2 sentences?", "stream": False})
print(r.json()["response"])
That’s it. As simple as that.