r/AlignmentResearch

Worries about AI’s risks to humanity loom over the trial pitting Musk against OpenAI’s leaders

AI safety evals should account for test-time compute

Many AI safety evaluations test whether a model is safe under a fixed and limited evaluation budget, but real adversaries may spend much larger and more adaptive test-time compute budgets if economically motivated.

I elaborated my thoughts in this article, where I argue that safety claims should be “budget-labeled”: https://huggingface.co/blog/Cerru02/safety-evals-should-project-ttc

Curious to hear what you guys think.

u/Cerru905 — 10 days ago