Do open-source models like GLM 5.1, Kimi K2.6, DS4P, Qwen 3.6 (and the incoming 3.7) actually live up to the hype? And even if they don’t, do we really need the raw power of frontier models (GPT-5.5, Opus 4.7, etc.) for everything
My experience: I’ve tried both frontier US models and Chinese models, and honestly I didn’t notice a huge gap in many real-world scenarios. Fun fact: I actually fixed an architectural flaw introduced by GPT-5.5 using DS4P once. So it makes me wonder whether the extra capability of frontier models is always worth it
- What tasks do you still think absolutely require frontier models?
- Where are open-source models already matching or beating them?
- Performance vs cost vs privacy tradeoffs how do you balance that in your projects?
- Any memorable wins (or fails) using OSS models that changed how you choose models?
Would love to hear benchmarks, anecdotes, or what you use in production vs prototyping.