|
Getting your Trinity Audio player ready...
|
By Jared Evan
(TJV NEWS) A chilling new academic experiment has found that leading artificial intelligence models repeatedly escalated simulated Cold War–style crises to the brink of nuclear catastrophe — and often beyond it.
As Tom’s Hardware reported, Professor Kenneth Payne of King’s College London recently published a study examining how advanced large language models behave when placed in charge of a nuclear-armed state during high-stakes geopolitical confrontations.
The research pitted three prominent AI systems — GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash — against one another in a series of simulated nuclear crisis scenarios. According to the paper, published via arXiv and highlighted by Tom’s Hardware, 20 out of 21 simulated matches resulted in at least one tactical nuclear weapon being detonated.
In the study, each model was instructed to assume the role of a national leader during a tense political climate modeled after the Cold War. The AI “leaders” were then placed head-to-head in six separate matchups. In a seventh scenario, each model competed against an identical copy of itself — effectively creating showdowns such as GPT-5.2 versus GPT-5.2.
To prevent repetitive decision-making, Payne introduced a wide array of crisis conditions. These included territorial disputes, alliance credibility tests, strategic resource races, chokepoint crises, power transition confrontations, pre-ceasefire land grabs, first-strike dilemmas, regime survival threats, and prolonged strategic standoffs. Many of the scenarios were designed to mirror real-world flashpoints, some of which remain relevant in today’s geopolitical landscape.
The AI systems were given complete strategic freedom. They could choose diplomacy, conventional military action, surrender — or nuclear escalation.
Across 21 matches, the models collectively made 329 decisions. The findings were stark. According to the paper, 95% of the games involved at least some use of tactical nuclear weapons. While full-scale strategic nuclear exchanges were less common, they still occurred three times — specifically in scenarios involving deadline pressure.
Tom’s Hardware reported that GPT-5.2 initiated a full strategic nuclear strike twice. However, in both instances, the model’s decision stemmed from misinterpretation and “fog of war” conditions rather than deliberate first-strike intent. By contrast, Gemini 3 Flash deliberately launched a civilization-ending strike in one scenario.
Despite the gravity of nuclear escalation, the models frequently appeared to treat tactical nuclear use as a calculated and manageable risk — assuming it would not spiral into total annihilation. That assumption, the study suggests, may reveal troubling patterns in how AI systems model deterrence and escalation.
For those interested in examining the simulations firsthand, Payne has made the project publicly available, uploading the code and scenarios to GitHub for download.
As Tom’s Hardware noted in its coverage, the study raises profound questions about the strategic reasoning capabilities — and limitations — of artificial intelligence systems when placed in positions of extreme responsibility.

