DOD Tests Generative AI with Scale AI for Military Applications

The US Department of Defense is working with Scale AI to test generative AI models for potential military applications like intelligence gathering and operations planning.
Scale AI will build a framework to evaluate large language models on metrics like performance, getting feedback to warfighters quickly, and testing models on specialized military datasets.
The goal is to leverage generative models' text analysis abilities to give commanders faster and better situational awareness to guide decision making.
However, risks like false information, data leaks, and adversaries' use of AI are key barriers to implementation that testing aims to uncover.
Scale AI is creating "holdout datasets" with effective responses to prompts to compare models' utility for the military before they are deployed.