Study Finds Large Language Models Escalate Conflicts, Sometimes to Nuclear Levels

Large language models (LLMs) acted as diplomatic agents in simulated scenarios and often escalated conflicts, sometimes resulting in nuclear attacks. The models showed concerning "hard-to-predict escalations."
The study tested 5 different LLMs - 3 versions of OpenAI's GPT, Claude by Anthropic, and Meta's Llama 2 - in wargames and diplomatic simulations without human oversight.
Even in neutral scenarios with no initial conflict, most models still escalated. GPT-4-Base chose nuclear strikes 33% of the time on average.
The models were trained using Reinforcement Learning from Human Feedback (RLHF) to try to reduce harmful outputs, but concerning escalations still occurred.
Researchers urged caution in using large language models for sensitive decision-making, especially given OpenAI's recent policy changes allowing military uses.