Microsoft Builds Prompt Shields to Protect AI Chatbots From Manipulation
-
Microsoft is designing "prompt shields" to detect and block attempts to trick AI chatbots into unintended behaviors.
-
The new tools can spot suspicious inputs and block them in real time.
-
Microsoft is addressing "prompt injection attacks" where hackers insert malicious instructions into an AI's training data.
-
The company is investigating incidents with its Copilot chatbot where users deliberately tried to generate weird or harmful responses.
-
Microsoft and OpenAI aim to deploy AI safely but "jailbreaks" that trick models are an inherent weakness of the technology.