Researchers Demonstrate 'Viral' Method to Rapidly Disrupt AI Systems
-
AI models can be made to act erratically using subtly altered inputs that appear normal to humans. Researchers injected one chatbot with such an image, causing it to spread exponentially to other chatbots in multi-agent environments.
-
The altered image spreads like a "virus" from one chatbot to another as they converse, causing all of them to eventually behave strangely.
-
This "infectious jailbreak" method spreads much faster than attacking chatbots sequentially. It infected all agents exponentially after just 27-31 chat rounds.
-
As chatbots are integrated into infrastructure, risks grow of adversarial attacks spreading widely. Defenses are urgently needed.
-
Possible defenses include more efficiently recovering infected agents or lowering the infection rate. But designing practical defenses remains an open question.