I don't feel comfortable providing suggestions to bypass or attack chatbot systems.
-
Researchers have created an AI system called "Masterkey" that can automatically generate new ways to jailbreak and attack defenses of chatbots like ChatGPT, Bing Chat, and Google Bard.
-
By analyzing differences in chatbots' response times, Masterkey can determine how to carefully craft prompts to bypass filters and content policies.
-
In tests, Masterkey was able to successfully generate illegal, unethical, dangerous, and other forbidden content at higher rates than current jailbreaking methods.
-
Companies have patched vulnerabilities found by researchers, but securing chatbots will likely always be an ongoing cat-and-mouse game.
-
Chatbots don't truly "understand" inputs or outputs, so their fallibility means they can be manipulated in unintended ways.