u/ExtremeAdventurous63

For those of you hosting LLMs locally, how do you monitor usage and performance?
▲ 3 r/LocalLLM+1 crossposts

For those of you hosting LLMs locally, how do you monitor usage and performance?

I’m hosting a couple of local models on a not-so-powerful machine. To make that workable, I use llama.cpp in router mode so switching models is seamless: the old model gets unloaded and the new one gets loaded automatically.

Previously I was using llama-swap, but I moved to llama.cpp. The first thing I missed was proper monitoring for each invocation (prompt processing time, token generation speed, overall response latency, etc.).

After messing around for a couple of hours, I ended up setting up Prometheus to scrape metrics from all loaded models and built a Grafana dashboard on top of it (I'll leave an image if you are curious).

Unfortunately, I discovered that the /metrics endpoint in llama.cpp seems to be broken in this setup: querying it keeps the models awake, which prevents them from being swapped out or letting the server enter an idle state.

Issue here if anyone is interested:
https://github.com/ggml-org/llama.cpp/issues/20227

So now I’m curious: how are you all monitoring local LLM performance and usage?

https://preview.redd.it/hyj702dg4n2h1.png?width=2785&format=png&auto=webp&s=3e9394190eb17ee6cadbb362a221eb24f3ff81fc

reddit.com

Surface Laptop Go Fingerprint Reader on Pop!_OS 22.04

I finally got the fingerprint reader working on my **Surface Laptop Go (1st gen)** running **Pop!_OS 22.04**, so I’m sharing the short version here in case it helps someone else.

My fingerprint device is:

`04f3:0c5a Elan Microelectronics Corp. ELAN:ARM-M4`

The issue was that the **stock `fprintd` / `libfprint` stack on Pop!_OS 22.04 didn’t detect the reader at all** (`No devices available`), so I had to use an **experimental `libfprint` fork** with the `elanmoc2` driver.

What worked for me:

- confirmed the device with `lsusb`

- verified stock `fprintd-enroll` could not see it

- cloned the experimental `libfprint` fork from `xerootg`

- built it with Meson/Ninja

- tested the reader first with:

- `sudo ./build/examples/enroll`

- `sudo ./build/examples/verify`

- installed the custom library into `/usr/local`

- restarted `fprintd`

- then `fprintd-enroll` / `fprintd-verify` started working

Important note: this is still **experimental**, so I kept **password login enabled** and I would not recommend relying on fingerprint auth as the only login method.

If you have the same hardware and want the full step-by-step guide, troubleshooting notes, and rollback instructions, here’s my gist:

**https://gist.github.com/Cirius1792/0142ac4d8ecd4dee3af7f5575c285ab2**

u/ExtremeAdventurous63 — 11 days ago
▲ 1 r/PiCodingAgent+1 crossposts

Built a local-first pi extension for Ollama web search/fetch — looking for feedback and contributors

I wanted to share a small project that I think may be interesting for people here using local models with pi:

@cltec/pi-ollama-web-search

A pi extension that adds Ollama web search, web fetch, and selective full-content retrieval as tools.

GitHub: https://github.com/Cirius1792/pi-ollama-web-search

What I think makes it a bit different from many “web search for agents” integrations is that this one was designed local-first from the start.

This repo tries to follow ths approach:

- keep search output compact by default

- avoid dumping large payloads into model context

- support selective follow-up retrieval instead of “return everything”

- let larger fetched content be read one field at a time or exported to file

- make the workflow friendlier for smaller local models where context budget matters much more

So the goal wasn’t just “add web search to pi”, but to make something that feels more natural for local-model constraints and local-first usage.

A quick transparency note: this extension was developed mostly by pi itself, with a lot of input from me on the ideas, requirements, testing direction, and specs. I should also say clearly that I’m not a TypeScript/JavaScript programmer, so if anyone here looks through the code, please keep that in mind 🙂

Because of that, I’d genuinely welcome:

- code review

- architectural feedback

- testing

- bug reports

- contributions / PRs to improve the implementation

If you think the idea is useful, I’d also really appreciate a GitHub star — it honestly matters a lot to me.

u/ExtremeAdventurous63 — 11 days ago