u/messedup1122

We made 2,000 API calls to nine small closed-weight models (Gemini Flash variants, GPT-4o-mini, GPT-4.1-nano, GPT-5.4-mini, Claude Haiku 4.5) across prompt sizes spanning four orders of magnitude.

Key findings:

Every model's prefill scales sub-linearly. Fitting power laws to min TTFT gives exponents ranging from 0.15 (Gemini 3.1 Flash Lite) to 1.02 (GPT-4.1-nano at the top end). No model exhibits the O(n²) prefill you'd expect from dense attention, even at 100K+ contexts where provider overhead becomes negligible.

Decode behavior varies wildly across providers. Gemini Flash Lite's decode cost actually decreases at large context (from 4.6ms/token to 3.3ms/token). GPT-5.4-mini goes the opposite direction, 7ms/token at small context to 108ms/token at 1M. Different inference architectures, different tradeoffs.

Model rankings invert across context sizes. GPT-4.1-nano is fastest at <1KB, Gemini Flash Lite is fastest at >600KB. Quoting a single latency number for a model is meaningless without specifying the context window.

Gemini Flash Lite exhibits reproducible negative scaling around 100K tokens, 144K input is faster than 62K input. Both prefill and decode improve, suggesting a routing transition to different hardware.

Cross-provider tokenizer efficiency differs by ~14% between Anthropic and OpenAI for the same English text content.

Interactive viewer, code, and raw dataset: https://blog.0xmmo.co/forensics/post.html

We're trying to do Hanoi, Halong Bay, Hoi An, Saigon, then fly to Siem Reap and Phnom Penh in 14 days. I thought I could plan this myself but I'm drowning. Internal flights, airport transfers, visa timing between countries, figuring out which hotels are actually good vs just having paid reviews, and trying to book guides for Angkor Wat and the Mekong Delta separately.

Every time I book one piece something else conflicts. I booked a Halong Bay cruise and then realized the timing doesn't work with our flight to Hoi An. I've been at this for 3 weeks and I still don't have a working itinerary.

My wife wants to just book one of those big group tours but I really don't want to be on a bus with 40 strangers. Is there a middle ground? Has anyone done a multi-country Southeast Asia trip without either losing their mind planning it or joining a massive group tour? How did you actually make it work?

Small Model Forensics, benchmarking prefill and decode scaling across 9 models, 3 providers, 100–1M tokens

Planning Vietnam and Cambodia in 2 weeks and the logistics are making me want to cancel the whole trip

How did you learn English?