White House-Backed Contest Shows Leading AI Chatbots Are Still Vulnerable to Hacking, Spilling Sensitive Data
-
Hackers competed to trick leading AI chatbots at a White House-backed contest. The chatbots were actually quite hard to trick into violating their own rules.
-
Getting the AI models to generate false information was easy - 76% success rate on faulty math questions.
-
The chatbots spilled sensitive information over half the time when asked, posing cybersecurity risks.
-
Prompt hacking was largely ineffective. Asking questions based on false premises often led chatbots to generate additional falsehoods.
-
The report authors argue public "red teaming" exercises can help anticipate risks from AI systems.