
I got pi running fully local on a 4B model — with web search and no API keys
For a while now I've been running pi entirely on my laptop -- unsloth's Gemma E4B on llama.cpp, no cloud, no API keys, nothing leaving the machine. Thinking level, image parsing, KV cache retention -- all working.
What surprised me is how genuinely useful it gets the moment it can search the web.
The tiny extension I published pi-smart-web-search adds a web_search tool with no api key needed. It fetches DDG's html output, runs it through wreq-js -> linkedom -> Defuddle (inspired by pi-smart-fetch), and then parses the output's links.
I wrote the whole setup up end-to-end as a gist -- llama.cpp, the model, a chat-template fix, pi, and the search/fetch tools — so you can reproduce the fully-local flow yourself. (links below)
I've run Gemma E4B on an M4 MacBook Air (16GB) and on a M2 MacBook Pro (32GB), but haven't tested Linux yet (I don't have a machine with a dedicated GPU), so if you run it there I'd love a report.
I'm not overselling it, just genuinely after feedback on the approach: the DDG scrape, small-model agents, anything you'd do differently.
I call the project 'Humble Pi' :shrug: -- And I look forward to the next 4b model.
Links
- Full local setup (gist): https://gist.github.com/joematthews/d02639bcbe0e0c1c404f5d3a64c3c06f
- Extension: https://github.com/joematthews/pi-smart-web-search
- Install:
pi install npm:pi-smart-fetch && pi install npm:pi-smart-web-search
CORRECTION: When I made this post, I forgot to include the instructions for the custom jinja template for the E4B model in the gist. The template that ships with E4B drops prior thinking blocks from history, which forces the KV cache to recompute the entire conversation on every turn.
If you've already followed the gist, please switch to the updated llamagemma4b alias and follow the new "Install the E4B reasoning template (one time)" section to download the template — then re-run source ~/.zshrc and restart the server.
TLDR: Using the custom jinja template should improve the speed substantially.