Do you find Opus 4.7 better or worse than 4.6?
In my experience Opus 4.7 is a downgrade from 4.6.... it hallucinates more, communicates worse, overlooks mistakes, suggests worse plans. This is my opinion from just rerunning the same prompt and context through both models in real world programming and analysis tasks and just judging which response was better. I've heard other people say the same thing but I'm curious to see a poll for consensus.
How is it that so many people find it worse but benchmarks consistently show it better? Are they nerfing the deployed model compared to the one they tested, or are benchmarks that far from real world tasks? Or are we just imagining it, or using it wrong?