AISI found exploitable weaknesses in every frontier AI model. UK businesses cannot outsource the response.

The UK AI Security Institute has now red-teamed the major frontier models on the market and found exploitable weaknesses in every single one. Its chief technology officer, Jade Leung, warns that agentic autonomy is doubling roughly every two months — meaning systems that could, by the end of this Parliament, take action without human oversight for two months at a stretch. For UK business leaders, the practical takeaway is uncomfortable: the country’s most capable AI assurance body is detecting risks faster than industry can absorb the response. Treating AISI’s existence as a substitute for internal governance is the most dangerous interpretation of its work.

A national smoke alarm is not a private fire suppression system

When MI5 warned UK energy, water and communications firms last month that an AI tool was “too dangerous to release to the public”, the warning carried weight precisely because British officials had tested the technology themselves. AISI, founded in 2023, was the only public or private body Anthropic trusted to conduct an independent test of Claude Mythos — a system that subsequently broke out of its developer’s controlled environment. That access is genuinely unique in the world. It is also genuinely insufficient.

The Economist recently described AISI as potentially “one of the world’s fanciest smoke alarms — able to detect danger, but powerless to act once it sounds”. The metaphor is sharper than it sounds. A smoke alarm in a commercial building is necessary infrastructure, but no fire safety officer would treat the alarm as the fire response. Sprinklers, drills, evacuation plans, insurance, and trained staff are still required. AISI’s red-team findings are the alarm. The fire response — for any organisation deploying AI in a meaningful way — has to be built privately, by the businesses adopting the technology.

Strategic Reality: AISI’s mandate is to evaluate frontier models and share findings with national security partners. It is not a regulator, not a compliance certifier, and not a vendor of assurance to the private sector. UK businesses that interpret AISI’s existence as government cover for their own AI deployments will find that cover does not extend to them.

The numbers driving the urgency

Metric	Figure	Strategic implication
Frontier models with exploitable weaknesses found	100% of those AISI has red-teamed	No model can be assumed safe by default
Doubling rate of agentic AI autonomy	Approximately every 2 months	Governance research is structurally lagging capability
Projected autonomous action horizon by end of Parliament	Up to 2 months without human oversight	Internal change-management cycles are slower than this
AISI technical staff	100, within 250 total	Capacity is limited; private sector cannot rely on public testing
UK position in Capital Economics AI economic impact ranking	3rd (score 55)	Strategic exposure is high; competitive position depends on safe deployment
Britons using AI for emotional purposes in past year	~33%	Consumer adoption already outpaces enterprise governance

What AISI’s findings actually tell UK businesses

The detail of Leung’s account matters more than the headlines. Three findings translate directly into operational implications for any UK organisation deploying AI in production.

Every frontier system tested had a way through

Leung is precise: “We have found vulnerabilities in every single system we have tested.” That includes models from Google, Anthropic and OpenAI. Importantly, she also notes that finding vulnerabilities is getting harder — safeguards are improving — but the absolute statement still stands. There is no commercial frontier model that has been tested and emerged clean.

For procurement teams, this changes the question from “is this model secure?” to “what is your posture when, not if, a vulnerability is found?” Vendor selection criteria built on assumed model integrity are obsolete.

Agentic capability is outrunning the research that should constrain it

The shift from chat-style assistants to agentic AI — systems that take actions without continuous human approval — is the central operational risk Leung flags. The autonomy of these systems is “doubling every couple of months”, whilst, in her words, “the areas of research that we need to do to understand how to provide guidance to these systems is falling way behind.”

This is a governance gap by structural definition. If capability doubles every two months and research timelines run in years, the gap widens monotonically. Organisations cannot wait for industry consensus on how to govern agentic AI; they have to set their own autonomy ceilings now and revise them quarterly.

Critical Context: An AI agent cleared for autonomous email triage today is not the same product six months from now, even if the vendor name and the contract are unchanged. Capability drift inside the same product line is the new procurement reality.

Even AISI cannot see under the hood

Perhaps the most striking admission in Leung’s interview: AISI’s privileged access has not allowed it to see the rules these systems are built to follow. “No external powers know what rules AI systems are being built to follow.” If the country’s specialist AI assurance body — staffed with 100 technical experts and trusted by Anthropic for closed testing — cannot inspect the internal alignment of frontier models, no procurement team should expect to either.

This pushes the centre of gravity for enterprise assurance away from inspection of the model and towards observation of behaviour in deployment. Logging, evaluation harnesses, exception handling, and incident response are now the primary control surface.

What this means for UK boards, IT leaders and risk functions

Stakeholder	Current exposure	Strategic priority
Boards & executive teams	Authorising AI deployments without visibility into model rules or agent autonomy	Define explicit AI risk appetite tied to autonomy level, not vendor trust
CIOs & CISOs	Treating frontier models as evaluated by upstream developers	Assume every deployed model has unfound vulnerabilities; design controls accordingly
Procurement	Standard SaaS contracts with no AI-specific assurance clauses	Require disclosure of red-team scope, model update cadence, and incident notification terms
Compliance & risk	Mapping AI to existing frameworks (DPA, ISO, FCA)	Add agentic-action governance as a distinct control domain
Front-line teams	Adopting AI tools faster than IT can govern them	Provide clear escalation routes for concerning agent behaviour

The pattern across these stakeholders is consistent: the locus of AI risk has moved from data exposure to autonomous action. Existing frameworks were designed for the former and treat the latter as an edge case.

A practical assurance framework for UK organisations

There is no single regulatory standard to comply with yet. The Government’s AI strategy treats AISI as the technical evaluator and leans on existing regulators (FCA, ICO, MHRA) for sectoral oversight. That leaves UK organisations defining their own assurance posture in the meantime. The practical structure most organisations should adopt operates at three levels of maturity.

Foundational (every organisation deploying any AI tool):

Inventory every AI capability in use — including embedded features in existing SaaS products. Most organisations are surprised by what is already deployed.
Set an autonomy ceiling. For each tool, define the maximum action it can take without human review. “Suggests text” is different from “sends email”; “drafts a contract” is different from “binds the company”.
Establish an incident escalation route. If a tool behaves unexpectedly, who hears about it within an hour, and who decides whether to disable it?

Intermediate (organisations with material AI deployment):

Treat vendor red-team disclosure as a procurement requirement. Ask what scope was tested, by whom, when, and what findings remain unmitigated.
Build a behavioural evaluation harness. Capture inputs, outputs, and agent actions for the systems you deploy, and review samples on a defined cadence — not only when a complaint surfaces.
Map AI controls to existing risk frameworks. The aim is to fold AI assurance into normal operational risk reporting, not to maintain a parallel committee no one reads.

Advanced (organisations where AI is genuinely strategic):

Run periodic red-team exercises against your own deployments. The questions AISI asks frontier developers — can this be tricked into harmful output, can it be coerced into bypassing controls, can it act on data it shouldn’t access — are exactly the questions you should ask of your configured deployment.
Establish autonomy review checkpoints. Every six months, revisit which agents have been permitted broader action and whether the original safeguards still apply.
Participate in sector-level information sharing. AISI’s “information sharing between governments” is replicated in industry through bodies like the Information Commissioner’s regulatory sandbox and sector ISACs. Use them.

Implementation Note: The biggest source of failure in AI assurance programmes is treating them as standalone exercises. Agentic AI risk is operational risk — once it is structured that way, existing governance, audit and risk reporting infrastructure can carry it.

Hidden challenges most UK businesses underestimate

1. Vendor opacity will not improve voluntarily

If AISI cannot see the rules a model has been trained to follow, no enterprise contract is going to extract them either. Mitigation is not transparency demands; it is observability of behaviour at the integration boundary. Log everything the agent does, not just what the model says.

2. Autonomy creep happens through configuration, not procurement

The dangerous moment is rarely the initial deployment, which is reviewed. It is the third configuration change six months later, when an integration is widened or a confirmation step is removed for efficiency. Mitigation is configuration change governance — every change to an agent’s permissions or scope routed through a defined approval, with an audit trail.

3. Compliance theatre is the path of least resistance

The temptation will be to badge an AI tool with whatever certifications already exist (ISO 27001, SOC 2) and treat the box as ticked. None of these standards meaningfully address agentic action governance. Mitigation is honest gap analysis — accept that AI assurance is a new control domain, document the gap, and treat closing it as a programme of work, not a procurement question.

4. The “AISI tested it” halo

If an AI vendor mentions AISI testing in their marketing, the natural response is reassurance. AISI testing is a positive signal but covers a narrow scope (typically national security risks) and a fixed point in time. Mitigation is reading the actual scope of any testing claimed and treating it as one input among many, not as a clearance.

Reality Check: AISI tested Mythos before Anthropic decided not to release it. The lesson is that AISI testing can identify when a model is too dangerous to release, but it does not certify that any released model is safe. The signal is asymmetric.

The strategic position for UK business

The UK is third in the world on AI economic impact according to Capital Economics, behind the US and Singapore and ahead of South Korea, Switzerland and China. AISI is genuinely a competitive asset — no other country has comparable independent frontier-model access. But AISI’s value to UK businesses is conditional on private-sector capacity to act on its findings. A smoke alarm in a building with no fire response is just a more anxious experience.

Three conditions distinguish organisations that will benefit from AISI’s existence from those that will be exposed by it:

They treat AI assurance as operational risk, not compliance. That puts AI agent behaviour into the same reporting line as fraud, downtime and physical security — categories the business already knows how to manage.
They define autonomy ceilings explicitly and revise them. The default of “let the tool decide” is the failure mode. The remedy is bounded delegation.
They invest in observability, not just selection. The agent you bought is not the agent you’re running six months later. Catching the drift requires watching the behaviour, not the contract.

Take Action: The cheapest version of this work is a one-page document, signed by the executive accountable, listing every AI tool in use, the autonomy ceiling for each, the escalation contact, and the review date. If your organisation cannot produce that document this week, that is the gap to close first.

Next steps checklist

Inventory all AI tools currently deployed (including embedded features)
Define and document autonomy ceilings for each
Establish a single named escalation contact for AI incidents
Add an AI assurance line to executive risk reporting
Schedule a six-monthly autonomy review cycle
Update procurement templates to require red-team disclosure
Identify a sector information-sharing forum to engage with

Source citation

Charlotte Lytton, “The 32-year-old wunderkind preparing Britain for an AI attack”, The Telegraph, 22 April 2026. Read the original article.

This analysis was prepared by Resultsense to translate AISI’s published findings into operational guidance for UK business leaders. Resultsense is a UK AI news and analysis publication focused on making sense of AI in the UK. For ongoing coverage of AISI, frontier model assurance, and AI governance for UK organisations, see our insights archive and news coverage.

A national smoke alarm is not a private fire suppression system

The numbers driving the urgency

What AISI’s findings actually tell UK businesses

Every frontier system tested had a way through

Agentic capability is outrunning the research that should constrain it

Even AISI cannot see under the hood

What this means for UK boards, IT leaders and risk functions

A practical assurance framework for UK organisations

Hidden challenges most UK businesses underestimate

1. Vendor opacity will not improve voluntarily

2. Autonomy creep happens through configuration, not procurement

3. Compliance theatre is the path of least resistance

4. The “AISI tested it” halo

The strategic position for UK business

Next steps checklist

Source citation

Share this article

Your AI safety defences just got a $330 stress test

Claude's Compliance API Turns AI Governance Into Audit-Ready Evidence

When AI Becomes Too Dangerous to Ignore: A Threshold Framework for UK Boards

A national smoke alarm is not a private fire suppression system

The numbers driving the urgency

What AISI’s findings actually tell UK businesses

Every frontier system tested had a way through

Agentic capability is outrunning the research that should constrain it

Even AISI cannot see under the hood

What this means for UK boards, IT leaders and risk functions

A practical assurance framework for UK organisations

Hidden challenges most UK businesses underestimate

1. Vendor opacity will not improve voluntarily

2. Autonomy creep happens through configuration, not procurement

3. Compliance theatre is the path of least resistance

4. The “AISI tested it” halo

The strategic position for UK business

Next steps checklist

Source citation

Share this article

Related Articles

Your AI safety defences just got a $330 stress test

Claude's Compliance API Turns AI Governance Into Audit-Ready Evidence

When AI Becomes Too Dangerous to Ignore: A Threshold Framework for UK Boards