
AI capability forecasts deserve better models than curve fitting (ft. LPPLS)
We've been debating sigmoids here, and in the thread there was a lot of good discourse.
I argued there and elsewhere that the wrong question was being focused on. I don't think a lot of people addressed this:
> If they’re not treating AI as a black box, and claim to be modeling the dynamics explicitly, then what is their model?
I wrote a piece on what the other models could be, that get us out of "which curve is this fitting" as the dominant frame here. This post elides the math - if you want to see the full model parameter exploration, check it out.
The models I'm considering come from systems-thinking, forecast-evaluation, and complex-systems literature. Each of these literatures has spent decades building tools for exactly the question we should be asking here: what does a model look like that commits to its own failure conditions before the prediction window closes?
I focus on one in the first piece, but there are several models worth digging into, they just each deserve a full exploration.
Didier Sornette, the dragon-king, and LPPLS
Sornette is a physicist at ETH Zürich who spent thirty years building tools to predict regime changes in nonlinear systems: they have been applied to financial bubbles, earthquakes, material failures, epileptic seizures, and ecosystems. His Log-Periodic Power Law Singularity (LPPLS) model fits a specific functional form to systems approaching a critical transition. The functional form has a finite-time singularity built into it, and the model commits to a date range within which the transition will occur. If the date range passes and the regime change does not occur, the model is wrong in a way that registers as wrong, not as needing a parameter refinement.
This is an architectural feature missing from current curve-fitting frameworks. METR’s doubling-horizon work commits to a functional form (exponential) and a parameter (the doubling rate), but does not commit in advance to which observations would force them to abandon the framework rather than adjust the parameter. Sornette’s LPPLS commits to the functional form and to the failure condition simultaneously, because the functional form has the singularity baked in. If the singularity doesn’t arrive in the predicted window, you have a failed LPPLS.
The dragon-king concept extends this framework. He argued, against the dominant black-swan framing, that the largest events in many complex systems are not random outliers from a power-law tail. They are products of distinct mechanisms (positive feedback loops, tipping points, bifurcations, and phase transitions) that operate only in specific regimes. The largest events are statistically distinguishable from the rest of the distribution because they come from a different generative process. This is consequential for AI forecasting because it inverts a common implicit assumption: that “transformative AI” lives on the same curve as “current AI,” just further along. Sornette’s framework says: maybe not. Maybe the transformative event, if it comes, is generated by a mechanism that does not appear in the current trajectory at all. Curve-fitting against the current trajectory cannot, in principle, predict events generated by mechanisms outside the trajectory.
There is a useful asymmetry in this view. Power-law extrapolation gives you no leverage on dragon-kings, but mechanism-based monitoring sometimes does. Sornette’s Financial Crisis Observatory (now here) monitors twenty-five thousand assets daily for log-periodic precursor signals: measurable features that show up before a phase transition, even when the timing within the precursor window is uncertain. He doesn’t predict the next grain that triggers the avalanche, he measures the pile’s slope.
The AI-forecasting equivalent would be to ask: what are the measurable precursors of a phase transition in AI capability? Specifically: “are the structural conditions that would enable a phase transition assembling themselves?” That is a different research program than curve-fitting.
The Substack piece walks through what LPPLS would commit you to if you applied it to METR's time-horizon dataset, what each parameter means, which ones are diagnostic versus fitted, and what specific observations would falsify the model before the prediction window closes. I'm not fitting the model because the dataset is too short for seven-parameter estimation. I'm showing what fitting it would mean, and what the discipline of specifying failure conditions in advance actually looks like.
I also commit publicly in the piece: if a competent practitioner fits LPPLS to METR's dataset over the next twelve months and the criticality exponent lands outside (0,1) or no log-periodic structure appears at conventional significance, I'll treat the phase-transition hypothesis as not on the table for this operationalization and say so in writing. If it lands inside (0,1) with significant structure and survives out-of-sample testing, I'll treat it as live and update my forecasts.
I'm looking for some help extending this:
- Anyone with LPPLS finance experience: what is your honest assessment of its empirical track record, and what would have to be true for the architecture to transfer to AI capability cleanly?
- What's the strongest version of the case against phase-transition framing for AI capability?
- Is anyone familiar with other non-curve frameworks worth surfacing? I have a few candidates queued up but don't know what I don't know.