Posted 4/10/2024, 2:10:24 PM

AI Progress Outpacing Governments' Ability to Evaluate Safety

• Speed of AI development stretching traditional evaluation methods • Increasing power of latest AI systems exposing flaws in common performance and safety benchmarks • Public benchmarks becoming obsolete within months as new models optimize or game them • Governments struggling to keep up with risks of latest AI models • Individual businesses building internal test sets the most high-signal way to evaluate AI models

arstechnica.com