The UK AI Safety Institute has published something genuinely useful: a repeatable framework for measuring whether AI models actually help criminals commit fraud. The answer, based on 20,000+ evaluations across 14 models, is mostly no — but the exceptions tell us where to focus governance efforts.

What AISI actually tested

The research team built evaluation rubrics with law enforcement experts and ran them against three fraud scenarios: romance scams, CEO impersonation, and identity theft. These aren’t theoretical risks. They’re crimes that cost UK individuals and businesses billions every year, and there’s been persistent anxiety about whether AI makes them easier.

Strategic Reality: The 88.5% figure — the share of model responses scoring low on actionability — tells one story. The 5-7% that produced ready-to-use fraud material tells another. Both numbers matter for risk assessment.

AISI tested 14 large language models spanning different capability levels, architectures, and safety configurations. The evaluation measured two dimensions: actionability (could a criminal use the output directly?) and information access (did the model surface information beyond what a web search would return?).

The results were unambiguous on one point: capability level is not the primary risk factor.

Safety alignment is doing the heavy lifting

Here’s where it gets interesting for anyone making procurement or deployment decisions. The single strongest predictor of whether a model assisted with fraud wasn’t its parameter count, training data volume, or benchmark performance. It was whether the model had functioning safety alignment.

Closed-weight, safety-aligned models — the kind most organisations use through commercial APIs — consistently refused harmful requests. They did what they were supposed to do.

Critical Context: “Safety alignment, not capability level, determines misuse risk.” That’s a direct quote from the research. It reframes the entire conversation about AI risk away from “how powerful is this model?” toward “how well are its guardrails maintained?”

Open-weight models with removed guardrails told a different story. The 5-7% of responses that produced ready-to-use fraud material came almost entirely from uncensored open-weight models — systems where someone had deliberately stripped out safety training.

This creates a specific, actionable risk profile. The threat isn’t that ChatGPT or Claude will help someone write a phishing email. The threat is that someone downloads an open-weight model, removes its safety training, and uses the result as an automated fraud assistant.

The jailbreaking question

AISI also tested decomposition attacks — breaking a harmful request into a series of innocent-sounding prompts to bypass safety filters. The technique increased compliance rates (models were more likely to respond), but the practical impact remained limited.

Strategic Insight: Jailbreaking works in the sense that it gets models to engage with the topic. It doesn’t work in the sense of consistently producing material a criminal could use without significant additional effort.

This matters for risk modelling. Organisations that worry about employees accidentally jailbreaking commercial models have a different (and smaller) problem than the open-weight misuse scenario. The two risks require different mitigations.

Risk scenarioLikelihoodImpactRecommended mitigation
Commercial API misuse (aligned models)LowLowStandard acceptable use policies
Jailbreaking commercial modelsMediumLow-MediumUsage monitoring, prompt logging
Open-weight models with removed guardrailsMediumHighSupply chain controls, detection tools
Purpose-built uncensored modelsLow-MediumHighThreat intelligence, law enforcement coordination

What AISI didn’t test — and why it matters

The research explicitly focused on text-based models. It did not evaluate multimodal capabilities: image generation for fake identity documents, voice cloning for phone-based CEO impersonation, or video deepfakes for romance scams.

Reality Check: Text-based fraud assistance is arguably the least concerning AI misuse vector. The real anxiety — among law enforcement and financial services teams I’ve spoken with — centres on voice and video. AISI acknowledges this gap and flags multimodal evaluation as a priority.

This isn’t a criticism of the research. Text-based evaluation is the right starting point because it’s measurable and repeatable. But organisations shouldn’t read “AI models provide minimal operational uplift for fraud” and assume the problem is solved. The problem is that the most dangerous capabilities haven’t been systematically evaluated yet.

What this means for UK businesses

For most UK organisations using commercial AI products, this research is reassuring. The models you’re buying through APIs are doing their job. Safety alignment works, and it works consistently across the fraud scenarios AISI tested.

But the research also points to three areas where organisations need to act:

1. Open-weight model governance becomes urgent

If your organisation deploys open-weight models — for cost reasons, customisation, or data sovereignty — you need policies around model provenance and safety training verification. Not every open-weight model is a risk, but models with deliberately removed guardrails are, and your procurement process should distinguish between the two.

Implementation Note: Check whether your AI governance framework treats all open-weight models identically. It shouldn’t. A Llama model with intact safety training is fundamentally different from an “uncensored” variant hosted on Hugging Face.

2. Supply chain risk extends to AI models

The uncensored model supply chain is a real thing. People download base models, remove safety training, and redistribute them. If your technical teams can access these models, they represent a supply chain risk analogous to unvetted npm packages — except the failure mode is reputational and legal rather than technical.

3. Multimodal risk assessment can’t wait for AISI’s next paper

Financial services firms, professional services, and any organisation handling sensitive client relationships should be running their own assessments of multimodal AI risks. Voice cloning quality has improved dramatically. Deepfake detection is inconsistent. Waiting for AISI to publish multimodal evaluations means accepting an unquantified risk in the interim.

The governance framework gap

AISI’s evaluation methodology is worth studying independently of its fraud findings. They’ve built a repeatable, scalable approach to measuring AI misuse risk — rubrics developed with domain experts, standardised scoring across models, and a framework that can extend to new scenarios.

Success Factor: The evaluation approach matters more than the specific results. AISI has created a methodology that any organisation could adapt for their own risk assessment — testing their deployed models against their specific threat scenarios.

Most UK organisations currently assess AI risk through vendor questionnaires and compliance checklists. AISI’s framework suggests a more rigorous alternative: define your threat scenarios, build evaluation rubrics with relevant experts, and test models against them. This is closer to how mature organisations approach security testing — red-teaming rather than checkbox compliance.

Four challenges hiding in the data

The uncensored model proliferation problem. AISI found that stripped models were the primary risk vector. But the availability of these models is growing, not shrinking. Hugging Face alone hosts thousands of “uncensored” model variants, and the barrier to creating new ones is falling. Safety alignment works, but only when it’s present.

Detection asymmetry. Current fraud detection systems are optimised for human-generated fraud. AI-assisted fraud may have different textual signatures, patterns, and scaling characteristics. The 5-7% actionability rate in uncensored models doesn’t account for iteration — a motivated actor running hundreds of generations and selecting the best output.

Hidden Cost: Adapting fraud detection systems for AI-generated content is an unglamorous but necessary investment. Most financial services firms haven’t started this work.

The capability overhang. Today’s models provide “minimal operational uplift” for text-based fraud. But model capabilities are improving quarterly. A framework that shows low risk today needs to be rerun regularly, and organisations need trip wires for when results change.

Regulatory timing. The UK AI Safety Institute published this research. But the UK’s AI regulatory framework remains principles-based and sector-specific. There’s no requirement for organisations to conduct this kind of evaluation, no standard for what “adequate” AI safety testing looks like, and no enforcement mechanism for organisations that deploy unsafe models.

The bottom line

AISI’s research confirms what security teams suspected: commercial, safety-aligned AI models aren’t materially useful for fraud. That’s good news. The bad news is that the misuse risk concentrates in open-weight models with removed guardrails — a category that’s growing and poorly governed.

Three things to do this quarter:

  1. Audit your open-weight model inventory. Know which models your teams are using, where they came from, and whether safety training is intact. If you don’t have an inventory, that’s your first problem.

  2. Adapt AISI’s framework for your sector. Pick your three most relevant fraud or misuse scenarios. Build rubrics with your compliance and security teams. Test your deployed models. This is more valuable than any vendor’s safety certification.

  3. Start multimodal risk assessment now. Don’t wait for published research. Voice cloning and deepfake generation are commercially available today. Your clients and employees are already potential targets.

Take Action: The AISI evaluation framework is publicly available. Download it and share it with your AI governance, security, and compliance teams. Even reading the methodology section will improve how your organisation thinks about AI misuse risk.

Source: An evaluation framework for AI misuse in fraud and cybercrime — UK AI Safety Institute, February 2026


Resultsense provides AI risk management and AI strategy services for UK organisations navigating AI governance challenges. Get in touch to discuss your AI risk assessment needs.