r/AlignmentResearch

▲ 13 r/AlignmentResearch+2 crossposts

AI safety evals should account for test-time compute

Many AI safety evaluations test whether a model is safe under a fixed and limited evaluation budget, but real adversaries may spend much larger and more adaptive test-time compute budgets if economically motivated.

I elaborated my thoughts in this article, where I argue that safety claims should be “budget-labeled”: https://huggingface.co/blog/Cerru02/safety-evals-should-project-ttc

Curious to hear what you guys think.

u/Cerru905 — 10 days ago