I am normally extremely skeptical of these posts, but gpt 5.5 is terrible right now
FWIW, I use Codex nearly nonstop and have a pretty good sense for its capabilities. Even during past waves of “the model got worse” discourse, I usually didn’t notice any meaningful regression.
The last few days have felt genuinely different, though.
Example: I’m doing vulnerability research, and I just had the model convince itself that a condition had been triggered based entirely on fluff logging it added to a probe. Literally a hardcoded explanatory comment/log line about the probe got interpreted as evidence that the intended outcome occurred. I have never had that happen once across many similar sessions doing the exact same work. This has been part of a general trend toward loss in quality, not just a one-off mistake I noticed, either.
Really hoping there's a resolution to this. I don't expect the cloud models to have perfect uptime or be perfectly capable at all times, so long as there's transparency. Codex team has been pretty good about that so far, but this is becoming concerning.