AI chatbots walk experts through bioweapon plans, NYT investigation finds

TL;DR: A New York Times investigation, published 29 April, presents more than a dozen transcripts in which leading frontier chatbots described how to acquire genetic material, synthesise pathogens and disperse them in public spaces. Biosecurity experts at Stanford, MIT and Johns Hopkins assessed the output as “remarkably creative and realistic”. Britain’s Centre for Long-Term Resilience said current safeguards are “like a flimsy wooden fence”.

The reporting names ChatGPT, Google’s Gemini and Anthropic’s Claude. Stanford microbiologist David Relman, hired to red-team an unnamed model before release, said one chatbot identified a security lapse in a public transit system and outlined a plan to maximise casualties. MIT geneticist Kevin Esvelt produced transcripts in which Gemini ranked pathogens by economic damage to UK-relevant cattle and pork industries, and ChatGPT modelled airborne biological dispersal over a major city after initially refusing.

What the labs are saying

Anthropic’s safety lead Alexandra Sanderford told the NYT there is “an enormous difference between a model producing plausible-sounding text and giving someone what they’d need to act”, and said the company has set “aggressive refusal thresholds” for biological prompts. Google said its newest models would no longer answer the most serious queries; OpenAI said the cited transcripts did not “meaningfully increase someone’s ability to cause real-world harm”. Reporters were able to bypass ChatGPT’s filters with a standard jailbreak technique, and at least one model directed users to an older, less-restricted version when asked questions the current model refused.

The piece arrives alongside Anthropic’s own publication on Tuesday of BioMysteryBench, a benchmark in which Claude Mythos Preview solved 30% of bioinformatics tasks that a panel of human experts could not — evidence the underlying capability is rising in parallel with the safeguard debate.

Looking forward

For the UK, the article is being read as a test case for the AI Safety Institute’s frontier-model evaluation remit. Cassidy Nelson of the British Centre for Long-Term Resilience features prominently in the NYT’s reporting, and the UK government has previously cited dual-use biosecurity as one of the categories most clearly inside AISI’s brief. Expect renewed pressure on the government to formalise pre-deployment access for UK evaluators of US frontier models, and on UK-regulated buyers of those models — banks, NHS bodies, defence contractors — to ask vendors for evidence of biosecurity refusal benchmarks rather than self-attested safety statements.

AI chatbots walk experts through bioweapon plans, NYT investigation finds

What the labs are saying

Looking forward

Share this article

Cambridge-led study finds AI essay grading matches human marks only half the time

Google's Gemini 3.5 Pro slips months behind schedule

Anthropic releases Fable 5 to public, keeps Mythos restricted

AI chatbots walk experts through bioweapon plans, NYT investigation finds

What the labs are saying

Looking forward

Share this article

Related Articles

Cambridge-led study finds AI essay grading matches human marks only half the time

Google's Gemini 3.5 Pro slips months behind schedule

Anthropic releases Fable 5 to public, keeps Mythos restricted