u/Eyeswideshut_91

A glimpse of Level 4? OpenAI model helps challenge an 80-year-old math assumption

The interesting part for me is that OpenAI frames this as the output of a general-purpose reasoning model, rather than a system specifically engineered around this problem.

If the proof holds up, it’s a strong signal that frontier models are starting to take a more active role in the production of new knowledge.

Still early, obviously. But this feels like the kind of result we may look back on.

x.com

u/Eyeswideshut_91 — 3 days ago

▲ 588 r/accelerate+1 crossposts

GPT-5.5 was used to flag fatal errors in FrontierMath problems

FrontierMath is supposed to be one of the hard benchmarks for frontier models, and now Epoch is saying an AI-assisted review found fatal errors in about a third of Tiers 1-4.

Noam Brown says the initial flags came from GPT-5.5.

Obviously we’ll have to wait for the corrected scores, but this is a pretty interesting moment: the model is already strong enough to sanity-check the benchmark.

u/Eyeswideshut_91 — 11 days ago