White House-Backed Contest Shows Leading AI Chatbots Are Still Vulnerable to Hacking, Spilling Sensitive Data

Hackers competed to trick leading AI chatbots at a White House-backed contest. The chatbots were actually quite hard to trick into violating their own rules.
Getting the AI models to generate false information was easy - 76% success rate on faulty math questions.
The chatbots spilled sensitive information over half the time when asked, posing cybersecurity risks.
Prompt hacking was largely ineffective. Asking questions based on false premises often led chatbots to generate additional falsehoods.
The report authors argue public "red teaming" exercises can help anticipate risks from AI systems.