Considering the switch from CC Max to Ollama Cloud Max + GLM 5.2 + Pi -- What should I expect?
▲ 27 r/PiCodingAgent+1 crossposts

Considering the switch from CC Max to Ollama Cloud Max + GLM 5.2 + Pi -- What should I expect?

What I can't figure out from the docs — hoping someone running this can tell me:

  1. Real throughput at volume. At my level (peaks ~120M tokens/day, heavy cache reuse), does Max actually hold, or am I hitting the 5h/weekly wall on big days? Anyone pushing agentic coding this hard on Max?
  2. Level-4 drain. How much faster does a heavy model like GLM 5.2 burn quota vs the lighter cloud models? My current cloud usage is on gemma, so I've got no baseline that transfers.
  3. Caching. A huge chunk of my CC efficiency is prompt caching. Does Ollama Cloud do anything equivalent, or am I eating full prefill every turn? This is the one I'm most unsure about.
  4. Agentic reliability. Ollama CLoud GLM 5.2 + pi.dev -- do all tool calls, thinking level, /resume, etc. work as expected?

Links:

u/joematthewsdev — 3 days ago
▲ 28 r/angular

Extreme Angular has been updated to 22 -- Is this still relevant?

After a hiatus, I've updated my starter template, extreme-angular, to v22.0.3.

I started this project out of frustration. After working on several Angular projects (including ones I created), I kept thinking, "I wish strict linting and formatting best practices for Angular were more obvious, or the default."

Despite the name, extreme-angular is fairly tame. Most of the ESLint plugins use their 'recommended' presets; the main exception is typescript-eslint, where both strict-type-checked and stylistic-type-checked are enabled. The idea is that a team starts from a strict, opinionated set of TypeScript rules and then configures or removes rules as needed.

In addition to ESLint, extreme-angular also sets up:

  • Prettier -- Fixes angular template formatting issues, formats staged files on commit, prettier --check in CI.
  • Stylelint -- linting for CSS and SCSS.
  • CSpell -- spell checking across the whole project.
  • Husky and lint-staged -- fast checks on staged files at pre-commit, plus a pre-push hook that runs the full CI suite locally.
  • Commitlint -- enforces Conventional Commit messages.
  • GitHub Actions -- CI that validates every pull request.
  • Stricter TypeScript -- noUncheckedIndexedAccess on top of Angular's default strict.
  • VS Code -- workspace settings and recommended extensions so the tooling works the moment you open the project.

I can't help but wonder: is a starter template like extreme-angular still relevant in the age of LLM agents?

For me, extreme-angular works mainly as a reference. When I'm using an agent, I tell it to read the project and fold the setup into a PR for the team to discuss, then configure the rules based on their feedback.

All that said, I feel like I'm in an echo chamber. The feedback I've gotten has been very useful but limited, so I'm curious:

  • Do you find extreme-angular useful as a starter template or a reference?
  • What ESLint packages, rule configurations, and custom rules do you use at work?
  • Will oxlint and oxfmt (from https://oxc.rs/) replace ESLint and Prettier, given they're Rust-based, faster, and quicker for agents to run?
  • What other tools do you use at work that help both humans and agents?
github.com
u/joematthewsdev — 6 days ago
▲ 209 r/llamacpp+2 crossposts

I got pi running fully local on a 4B model — with web search and no API keys

For a while now I've been running pi entirely on my laptop -- unsloth's Gemma E4B on llama.cpp, no cloud, no API keys, nothing leaving the machine. Thinking level, image parsing, KV cache retention -- all working.

What surprised me is how genuinely useful it gets the moment it can search the web.

The tiny extension I published pi-smart-web-search adds a web_search tool with no api key needed. It fetches DDG's html output, runs it through wreq-js -> linkedom -> Defuddle (inspired by pi-smart-fetch), and then parses the output's links.

I wrote the whole setup up end-to-end as a gist -- llama.cpp, the model, a chat-template fix, pi, and the search/fetch tools — so you can reproduce the fully-local flow yourself. (links below)

I've run Gemma E4B on an M4 MacBook Air (16GB) and on a M2 MacBook Pro (32GB), but haven't tested Linux yet (I don't have a machine with a dedicated GPU), so if you run it there I'd love a report.

I'm not overselling it, just genuinely after feedback on the approach: the DDG scrape, small-model agents, anything you'd do differently.


I call the project 'Humble Pi' :shrug: -- And I look forward to the next 4b model.

Links


CORRECTION: When I made this post, I forgot to include the instructions for the custom jinja template for the E4B model in the gist. The template that ships with E4B drops prior thinking blocks from history, which forces the KV cache to recompute the entire conversation on every turn.

If you've already followed the gist, please switch to the updated llamagemma4b alias and follow the new "Install the E4B reasoning template (one time)" section to download the template — then re-run source ~/.zshrc and restart the server.

TLDR: Using the custom jinja template should improve the speed substantially.

u/joematthewsdev — 8 days ago