Anthropic donates Petri alignment toolkit to nonprofit Meridian Labs

TL;DR:

  • Anthropic has handed development of Petri, its open-source alignment-evaluation toolkit launched in October 2025, to AI evaluation nonprofit Meridian Labs, releasing version 3.0 with a new “Dish” add-on aimed at making test scenarios harder for models to detect as evaluations.
  • The UK’s AI Security Institute already uses Petri as a major part of how it evaluates frontier models for sabotage propensity in AI research.
  • Resultsense view: the donation echoes Anthropic’s 2024 handover of the Model Context Protocol to the Linux Foundation. It is a deliberate institutional move to keep alignment-evaluation infrastructure outside any single frontier lab — and a quiet boost to organisations like AISI whose credibility depends on that independence.

Anthropic announced on 7 May that Petri’s third version is shipping under new ownership at Meridian Labs, an evaluation-focused nonprofit. Petri lets researchers stress-test any large language model for behaviours such as deception, sycophancy and willingness to cooperate with harmful requests. An “auditor” model probes the target model across alignment-relevant scenarios, and a “judge” model scores the resulting transcripts.

What is changing in Petri 3.0

The new release splits the auditor and target models into separate components that can be tweaked independently, addressing a long-standing complaint that integrated test harnesses were too rigid. A new add-on called “Dish” runs scenarios using the model’s real production system prompt and scaffold rather than synthetic stand-ins, reducing the chance that a model can detect it is being evaluated and behave differently. Petri also now integrates with Bloom, Anthropic’s other open-source alignment tool, which performs deeper assessments on specific behaviours.

Why hand it over

Anthropic compared the move to its earlier donation of the Model Context Protocol to the Linux Foundation. The argument is the same: evaluation tools used to certify frontier-model behaviour need to be governed by a body that has no commercial stake in any model’s verdict. Meridian Labs already hosts Inspect and Scout, building what Anthropic describes as an open evaluation stack accessible to independent researchers, governments and other labs.

UK relevance

The AI Security Institute’s reliance on Petri is the most concrete signal of how this evaluation infrastructure already shapes UK government oversight of frontier models. Independent stewardship reduces the risk that a single lab can influence the tooling regulators use to assess its own outputs. The Anthropic Institute, also unveiled this week, will continue to publish research on safety and societal impacts using Anthropic’s internal vantage point — but the evaluations themselves will now sit further away from the company’s control.

Looking forward

Expect other frontier labs and government safety bodies to integrate Petri 3.0 over the coming months, particularly where comparability across models matters. Whether Meridian Labs can sustain the operational tempo Anthropic kept up internally — frequent releases, fast iteration on adversarial techniques — is the variable to watch.