
90% of LLM classification calls are unnecessary - we measured it and built a drop-in fix (open source)
I kept running into the same pattern in production:
LLMs being used for things like:
- intent detection
- tagging
- moderation
…but most of those calls are actually very simple.
So I tested it.
On a standard benchmark (Banking77):
→ ~90%+ of inputs can be handled by a lightweight ML model
→ while keeping ~95% agreement with the LLM
Built a small library around that idea:
→ It learns from your LLM outputs
→ routes “easy” cases to a cheap model
→ keeps hard ones on the LLM
→ with a guarantee on quality (you set the threshold)
Result:
massive cost reduction without noticeable degradation
Fully open-sourced here:
https://github.com/adrida/tracer
Would love feedback from people running high-volume LLM pipelines - curious if you’re seeing the same pattern.
u/Adr-740 — 2 days ago