AISI: autonomous AI cyber capability now doubling every 4.7 months
TL;DR:
- The UK AI Safety Institute estimates the length of cyber tasks frontier AI models can complete at 80% reliability is doubling every 4.7 months — accelerated from AISI’s November 2025 estimate of 8 months.
- Two recent models, Claude Mythos Preview and GPT-5.5, beat both trend lines, with Mythos completing both AISI cyber ranges — the first time any model has finished the “Cooling Tower” range.
- The doubling-time estimate now closely tracks METR’s independent 4.2-month figure for software engineering — the two methodologies converging removes some of the “is this an artefact of one team’s evals?” question.
AISI’s evaluation suite measures frontier-model performance on tasks human cyber experts can complete in defined times, capped at 2.5 million tokens per task to keep results comparable across model generations. The cap deliberately understates real-world capability: in cyber-range experiments without it, AISI uses up to 100 million tokens and finds performance continues improving. The 4.7-month doubling is therefore a conservative reading of the trend, not an alarmist one.
What the new checkpoint added
The newer Mythos Preview checkpoint completed AISI’s “The Last Ones” range in 6 of 10 attempts and “Cooling Tower” — previously unsolved by any model — in 3 of 10 attempts. GPT-5.5 solved “The Last Ones” 3 times out of 10. These are within-model improvements between iterations, not cross-model jumps, which is a separate point AISI makes explicitly: “Notable capability jumps do not always require new model releases”.
UK angle: regulator data UK boards should be reading
For UK enterprise security leaders, the practical signal is in the National Cyber Security Centre advisory that AISI cross-references. NCSC has now published guidance on using AI models to find vulnerabilities — implicitly an acknowledgement that the same capability runs both ways. With the King’s Speech this week proposing reforms to the Computer Misuse Act that would give UK researchers clearer legal cover to test these capabilities, the regulatory and capability tracks are aligning for the first time. The Logicalis CIO data published this week showed only a third of UK CIOs have high confidence in their AI governance frameworks — AISI’s trend line says that window is shrinking.
Looking forward
AISI signals tougher evaluations are coming: new cyber ranges, enhancements to existing ones, and the addition of active cyber defences to better reflect real-world conditions. The methodological caveat — that current results are pulled from “only six tasks with durations of eight hours or more” — means the next round of evaluations will be the real test of whether the acceleration is genuine or partly a measurement artefact. For UK boards setting AI risk appetite, the prudent assumption is that defensive baselines need to compound at least as fast as the capability trend.