AI Progress Outpacing Governments' Ability to Evaluate Safety
• Speed of AI development stretching traditional evaluation methods • Increasing power of latest AI systems exposing flaws in common performance and safety benchmarks • Public benchmarks becoming obsolete within months as new models optimize or game them • Governments struggling to keep up with risks of latest AI models • Individual businesses building internal test sets the most high-signal way to evaluate AI models