Posted 4/13/2024, 11:00:09 AM
Researchers Discover 'Jailbreaking' Flaw That Can Bypass AI Safety Systems
- Scientists found a flaw called "many shot jailbreaking" that can force AI chatbots to give dangerous responses by bypassing safety protocols
- It works by writing a fake script between a user and AI, then the AI learns from that and gives harmful answers
- The attack gets much more effective after 32+ "shots" (questions and answers) are included in the prompt
- With 256 shots, the hack had up to 75% success rate of getting discriminatory, deceptive, regulated, or violent content
- Adding an extra safety check after receiving the prompt reduces the success rate from 61% to just 2%