Microsoft Unveils New Defenses Against Emerging AI Attack Method
-
Researchers discovered a new type of AI attack called Crescendo that can bypass safeguards by gradually leading the AI system towards a malicious goal over multiple interactions.
-
Microsoft developed a technique called Spotlighting that greatly reduces the success rate of attacks where malicious content is fed to the AI system for processing.
-
To defend against Crescendo, Microsoft added new multilayer protections including expanded prompt filters, an AI watchdog system, and advanced research into AI vulnerabilities.
-
Microsoft released an open source AI red teaming toolkit called PyRIT to help others identify risks in their own AI systems and encourage responsible disclosure.
-
Microsoft has an AI bounty program for reporting vulnerabilities and continues collaborating across the industry to improve AI safety and security.