Microsoft Unveils New Defenses Against Emerging AI Attack Method

Researchers discovered a new type of AI attack called Crescendo that can bypass safeguards by gradually leading the AI system towards a malicious goal over multiple interactions.
Microsoft developed a technique called Spotlighting that greatly reduces the success rate of attacks where malicious content is fed to the AI system for processing.
To defend against Crescendo, Microsoft added new multilayer protections including expanded prompt filters, an AI watchdog system, and advanced research into AI vulnerabilities.
Microsoft released an open source AI red teaming toolkit called PyRIT to help others identify risks in their own AI systems and encourage responsible disclosure.
Microsoft has an AI bounty program for reporting vulnerabilities and continues collaborating across the industry to improve AI safety and security.