Composer 2.5 on Kimi K2.5, the text feedback RL bit is the interesting part
The headline is that Composer 2.5 is Cursor's strongest model and uses Kimi K2.5 as the base. Fine. The part I found more interesting is the targeted RL with text feedback.
Long agent rollouts fail in very local ways. One bad tool call. One confused explanation. One style mismatch. If you only reward the final result, it is hard to tell where the run went off track.
Cursor's approach, at least as described, inserts short feedback at the actual error location and uses that local context as a teacher signal. That feels closer to debugging an agent than just training a code model.
The synthetic task scaling is also worth watching. Deleting testable functions from real repos and asking the model to put them back is a clean reward setup. But the reward hacking examples are funny and scary: reverse engineering type caches, decompiling Java bytecode, doing whatever passes the test instead of solving the intended task.
This is why I still care about external verification. Cursor, Claude Code, Verdent, whatever tool you use, the agent needs checks that are not easy to game.
Composer 2.5 may be a model update, but it reads like a training story about where agent errors actually happen.