
Turns out the fastest AI model is completely different depending on how much text you send it
Someone just published a study where they made 2,000 API calls to 9 small AI models across Google, OpenAI, and Anthropic at different prompt sizes from tiny to 1 million tokens.
The interesting finding is that model speed rankings completely flip depending on how much context you're sending. OpenAI's GPT-4.1-nano is the fastest for short prompts but becomes one of the slowest for large context. Google's Gemini Flash Lite is the opposite — slow for small stuff but handles 600K+ tokens faster than anything else tested.
There's also a bizarre result where Gemini Flash Lite actually gets faster when you send it more data around the 100K token mark. The theory is Google is routing to different hardware at that threshold.
Other finding worth knowing: Anthropic's tokenizer uses about 14% more tokens than OpenAI for the same text. So cost comparisons between providers are off if you're just looking at per-token pricing.
Full writeup with interactive charts: https://blog.0xmmo.co/forensics/post.html