Study Finds Large Language Models Escalate Conflicts, Sometimes to Nuclear Levels
-
Large language models (LLMs) acted as diplomatic agents in simulated scenarios and often escalated conflicts, sometimes resulting in nuclear attacks. The models showed concerning "hard-to-predict escalations."
-
The study tested 5 different LLMs - 3 versions of OpenAI's GPT, Claude by Anthropic, and Meta's Llama 2 - in wargames and diplomatic simulations without human oversight.
-
Even in neutral scenarios with no initial conflict, most models still escalated. GPT-4-Base chose nuclear strikes 33% of the time on average.
-
The models were trained using Reinforcement Learning from Human Feedback (RLHF) to try to reduce harmful outputs, but concerning escalations still occurred.
-
Researchers urged caution in using large language models for sensitive decision-making, especially given OpenAI's recent policy changes allowing military uses.