u/Deannaoliver

Split my agent into a cheap router model and a premium synthesis model, bill dropped about 75%

I've been building an internal enrichment agent for our team (5 people, B2B sales context) that takes a list of company names and enriches them with public info before our outreach folks touch them. Around 8 tools wired in. The usual stuff: web search, scrape, internal vector DB lookup, dedupe against our CRM, classify by ICP fit, draft a short outreach paragraph, plus a couple of glue tools for handling edge cases.

When I first got it working everything was gpt-5.4 because that's what I had set up. Worked fine, bill was scary. Roughly $290 the first week processing about 1,200 companies. Wouldn't scale to the volume our sales person actually wants (closer to 5k/week).

Looked at the logs more carefully and the bill breakdown surprised me. About 75% of LLM calls were what I'd call "router" calls. Given the current state, the available tools, and the last tool result, pick the next action. These calls have a tiny output (one tool name plus a JSON arg blob) and don't really need 5.4-level reasoning. They just need to be cheap, fast, and barely smart enough to not pick stupid tools.

The remaining 25% were "synthesis" calls. Summarize this scraped page. Draft this paragraph. Reason about whether the evidence actually matches our ICP. Those benefit from a real model.

Swapped the architecture so routing uses GPT-OSS 120B on an OpenAI-compatible endpoint (I'm on GMI Cloud, a couple of other hosts price it similarly), synthesis stays on gpt-5.4. SDK doesn't care, you just pass a different base_url and model string depending on the call site.

Numbers from this week processing about 1,400 companies: total around $65. So roughly 78% reduction at slightly higher throughput. Quality on the final outputs feels the same to our sales person. We ran 50 companies through both stacks side by side before fully switching to validate.

A few things I had to fix:

  1. GPT-OSS 120B's tool calling JSON is mostly clean but occasionally leaves a trailing comma. Wrapped the parse in a sanitizer.

  2. Default max_tokens was 4096 and the model was happy to fill the reasoning channel even when I just wanted a tool pick. Dropped routing calls to 256 and tightened the prompt.

  3. Per-call latency on routing is maybe 100-200ms slower than 5.4 on average, but throughput is fine because routing isn't on the user-facing critical path.

If most of your agent calls are tool-pick decisions rather than synthesis, this split is probably the biggest single win available. Pulling them apart took us from "we can't scale this" to "it scales fine" without changing anything else.

The thing I'm still figuring out is whether GPT-OSS 120B is actually the right size for the routing job or whether I could push down to a 30-something B model and save more. Quality might tank with more tools registered, haven't actually tested yet.

reddit.com
u/Deannaoliver — 4 days ago

Routine's solid but my skin feels stuck

My routine has been pretty stable for like 8 months now and on paper everything is fine. Gentle cleanser, niacinamide serum in the morning, sunscreen, retinoid 2x a week at night, moisturizer. Nothing aggressive, nothing crazy.

But like. My skin isn't really getting any better either? It's not bad. There's no breakouts, no big issues. It just feels stuck at this baseline where I look kind of tired even when I'm not, and there's this dullness that no exfoliant or vit C is touching.

I went through a rough patch in February where I was sleeping 5 hrs a night for a few weeks and I think that's when it shifted. Routine never changed but my skin kind of did.

Mostly trying to figure out if this is a routine thing I'm just not seeing or something else entirely. Not really looking for product recs as much as just like, what shifted things for you.

reddit.com
u/Deannaoliver — 10 days ago
▲ 6 r/defi

Been thinking about this on and off for a while. Most of the "real yield" discussion still feels weirdly disconnected from how anyone actually uses money week to week.

Earning on USDC/USDT in Aave or Morpho makes sense on paper. The math works. But the moment I actually need to pay for something normal, it always turns into the same loop, withdraw, bridge if I'm on the wrong chain, send to an exchange, sell, wait, move to bank, then spend. By the time I've done all that I've usually given back a chunk of the yield in fees and time, and the experience honestly feels worse than just leaving the money in a regular savings account.

So I keep going back and forth on whether DeFi yield is actually replacing anything for me, or if it's just one more layer stacked before the same TradFi exit. The version I keep wanting is something like: stables stay self-custodied, earn some boring background yield, and I only move what I actually need to spend. Not chasing 30% APY on some new fork, just treating stables like a checking account that happens to earn something. But every time I try to set it up cleanly, bridge fees and chain fragmentation eat the simplicity, and then I'm reminded smart contract risk is still sitting underneath all of it.

Maybe I'm overthinking it.

reddit.com
u/Deannaoliver — 26 days ago