
Thinking Machines’ interaction models are more interesting than the benchmarks
The most important part here is not the benchmark numbers. It is the shift in product logic.
If this approach scales, a huge class of AI products may no longer need an external orchestrator.
Live translation, pronunciation tutors, an assistant that comments on code while you type, workout rep counting, navigation for blind users - a lot of this is currently built with awkward pipelines and noticeable latency.
Here, interactivity becomes a property of the model itself.
The limitations are real too. Long sessions fill up context fast. You need a stable connection. The current checkpoint is not their largest model. Their bigger models are still too slow for realtime use.
But the direction looks strong.
This is not just "ChatGPT with voice." It is an attempt to build AI that does not only answer after you finish. It is AI that can be present in the moment.