AI is now writing zero-days: rethinking UK cyber defence after Google’s 2026 threat report

TL;DR: Google’s Threat Intelligence Group (GTIG) has published its May 2026 update on adversarial use of AI (summary on the Google blog) and the headline finding is a threshold event: GTIG has, for the first time, identified a zero-day vulnerability it assesses was developed with AI assistance, intended for mass exploitation by a criminal group. Beyond the headline sits a wider pattern. State-affiliated actors from China and North Korea are running AI through specialised vulnerability datasets at industrial scale. Russia-nexus operators are using language models to generate decoy code that camouflages malware. An Android backdoor named PROMPTSPY calls Gemini in real time to decide what to do on a compromised device. A supply chain group has compromised LiteLLM, the AI gateway library, to harvest cloud credentials. None of this requires new attack categories; what has changed is the cost, speed and accessibility of capability that previously sat with well-resourced state teams. For UK security leaders the question is no longer whether AI is in the threat model, but which of their existing controls assume a slower, more expensive attacker than the one now on the wire.

The threshold event UK boards should not skip past

For the past three years, conversations about AI and cyber security in UK boardrooms have tended to circle two relatively comfortable propositions. The first is that AI helps defenders triage alerts. The second is that, sooner or later, attackers will use AI too. GTIG’s May 2026 report closes the gap between the second proposition and reality with unusual specificity. Mandiant incident responders, GTIG analysts and Google’s own platform abuse signals are now seeing AI used across the attack lifecycle by named threat clusters: APT45 from North Korea, APT27 and UNC2814 from China, UNC6780 (also tracked as TeamPCP) in financially motivated supply chain operations, and a Russia-nexus cluster delivering AI-built malware against Ukrainian targets. The detail that matters most for UK organisations is not which group did what, but the underlying economic shift. Tasks that used to require a senior offensive engineer working for days now take an automated prompt loop a few hours.

The most striking single incident is the AI-built zero-day. GTIG identified a Python script designed to bypass two-factor authentication on a popular open-source system administration tool. The flaw was not a memory corruption bug of the kind fuzzers find. It was a high-level semantic logic flaw, a hardcoded trust assumption the developer never intended to be reachable. Frontier language models, GTIG argues, are unusually good at this category of bug because they reason about developer intent rather than pattern-match on dangerous functions. The vendor was notified and the exploit was disrupted before mass deployment, but the precedent is set. The next one will be found by an attacker who is faster off the mark.

Strategic Reality: The first AI-developed zero-day was disrupted because Google’s own AI agent, Big Sleep, has been pulling in the same direction for defenders. The arms race is now AI versus AI, not analyst versus attacker. UK security investment plans written before mid-2026 almost certainly underweight defensive AI capability.

What changed, in numbers

CapabilityPre-AI baselineWhat GTIG observed in 2026UK implication
Vulnerability research time per CVEDays to weeks of analyst workAPT45 sending thousands of recursive prompts in parallelPatch cadence assumptions are too slow
Bug class coverageMemory corruption, injection, common implementation errorsSemantic logic flaws, hardcoded trust assumptions, dormant developer mistakesCode review and SAST tools miss the new class
Malware variant generationManual, signature-boundPolymorphic, just-in-time obfuscation (PROMPTFLUX, HONESTCUE)Signature-based detection further devalued
Operational anonymityStatic VPN exitsORB networks using residential 4G/5G SIMs (APT27 fleet management tool)IP reputation feeds need re-baselining
Phishing personalisationGeneric templates plus scraped namesLLM-generated org charts and exec hardware fingerprintingAwareness training premised on poor English is obsolete

What is actually happening on the wire

The report’s findings cluster into four interlinked shifts. None of them is hypothetical; each is supported by GTIG case detail or Mandiant incident response data.

The first is vulnerability discovery at scale. APT45 is sending thousands of repetitive prompts that recursively analyse CVEs and validate proof-of-concept exploits. UNC2814 prompts Gemini under a “senior security auditor” persona to dissect TP-Link firmware and Odette File Transfer Protocol implementations. The most sophisticated approach observed is a Claude Code skill plugin called “wooyun-legacy” that primes the model with a distilled knowledge base of more than 85,000 real-world vulnerability cases from the Chinese WooYun platform. In-context learning steers the model to think like a seasoned bug hunter rather than a general-purpose code reviewer. Defenders should assume this category of priming will become standard practice, and that the bug classes it surfaces will be those traditional scanners deprioritise.

The second is autonomous attack orchestration. PROMPTSPY, an Android backdoor analysed by ESET and re-examined by GTIG, embeds a hardcoded prompt that bypasses Gemini’s safety filters by assigning a benign persona, then asks the model to reason about UI bounds. The malware serialises the visible Android UI into XML, sends it to gemini-2.5-flash-lite via HTTP, and parses the structured JSON response into tap, swipe and gesture instructions. The “user goal” is supplied at runtime, not hardcoded, meaning the same payload can pursue different objectives depending on what operators want that day. PROMPTSPY also captures biometric data to replay authentication gestures and hides its uninstall button under an invisible overlay. Google has disabled the associated assets and Play Protect catches known variants, but the architectural pattern is the more important takeaway.

The third is AI-generated obfuscation. The Russia-nexus CANFAIL and LONGSTREAM malware families contain large volumes of LLM-generated decoy code: coherent but inactive routines, repeated daylight-saving-status queries, administrative-task stubs that exist only to camouflage malicious functionality. In some cases the LLM’s own explanatory comments survive in the source, telling defenders exactly which blocks are filler. Static analysis tools that triage by code-density heuristics now face attackers who can produce plausible inert code at marginal cost.

The fourth is industrial-scale model access. UNC6201 and UNC5673, both PRC-nexus clusters, are running automated registration pipelines that create, exploit and cancel premium LLM accounts on a rolling basis. UNC5673 uses tools like Claude-Relay-Service and CLI-Proxy-API to pool accounts across providers and share costs. Anti-detect browsers, CAPTCHA-bypass services and SMS verification farms underpin the workflow. The strategic point is that safety guardrails that depend on per-account behavioural signals are leaky when attackers cycle through accounts faster than abuse signals consolidate.

Critical Context: GTIG explicitly notes that frontier models still struggle with complex enterprise authorisation logic. Attackers are not getting magic capability uplift across the board. They are getting disproportionate uplift on the bug classes UK enterprises are weakest at finding internally: semantic logic flaws, configuration trust assumptions and developer intent mismatches.

The strategic picture for UK organisations

It is easy to read GTIG’s report as a story about Google and the foundation model providers. The harder reading, and the one UK boards need, is that the report describes a structural shift in attacker economics that every UK organisation inherits. There are three reasons it lands hard in the UK specifically.

UK enterprise security teams are still running on a baseline calibrated to the 2022-2024 threat environment. The 2023 NCSC guidance on AI and cyber security framed AI as an emerging factor; in 2026 it is the operating environment. Many internal patch SLAs (30 days for critical, 90 for high) were set when attacker development cycles were also measured in weeks. They are no longer.

UK supply chains lean heavily on open-source AI tooling. The TeamPCP compromise of LiteLLM, the BerriAI AI gateway, is the most consequential incident in the report for any UK organisation building internal AI agents on top of multi-provider abstractions. LiteLLM is a common dependency for the very integration patterns UK consultancies, fintechs and public sector pilots have been adopting through 2025. Stolen AWS keys and GitHub tokens from compromised build environments feed straight into ransomware and extortion pipelines that disproportionately target mid-market UK organisations.

UK information operations exposure is rising. The pro-Russia “Operation Overload” campaign GTIG describes uses AI voice cloning to impersonate real journalists, splicing fabricated audio onto authentic video to lend false messages the credibility of legitimate UK and European media brands. UK regulated industries (financial services, energy, healthcare) are downstream of public trust in those same brands.

Where the human factor breaks first

StakeholderWhat worked in 2024Where AI-augmented attackers exploit the seam
Security operations centre analystsTriage on signature + reputationPolymorphic payloads with decoy code defeat both
Patch management teamsRisk-rank CVEs by exploit availabilityExploits now arrive shortly after disclosure, sometimes before
Developers and platform engineersCode review for common bug classesSemantic logic flaws sit outside review heuristics
Procurement and vendor managementVendor questionnaires on security controlsAI dependency chain (gateways, agent libraries) is invisible to standard questionnaires
Board and audit committeeQuarterly risk reportsCadence mismatched to machine-speed change

Five priorities UK security leaders should act on now

The temptation when a report like GTIG’s lands is to commission a workstream and let it drift. The better response is to make five specific, time-bound changes in the next quarter. Each is calibrated to a UK organisational reality, not a generic enterprise.

One: rebaseline patch SLAs against AI-accelerated exploit cycles. For internet-facing and authentication-adjacent systems, treat critical CVEs as 7-day windows rather than 30-day windows. This is not a counsel of perfection; it is a recognition that the gap between disclosure and weaponisation has compressed. If your change advisory board cannot move at that speed, the change-management process is now the bottleneck, not the patching itself.

Two: audit your AI supply chain the same way you audit your software supply chain. Every UK organisation now running AI agents, retrieval-augmented generation pipelines or multi-provider gateways should inventory the open-source libraries those systems pull in. LiteLLM, BerriAI, langchain integration packages, OpenClaw-style skill marketplaces and similar dependencies need to be in the software bill of materials. The TeamPCP incident demonstrates that AI gateways are now a category of supply chain target, not a curiosity.

Three: assume your developers will be persuaded to paste code into AI assistants, and design controls accordingly. GTIG notes that even unintentionally insecure agent skills can leak credentials and tokens. The realistic UK enterprise threat model is not malicious developers; it is well-intentioned ones operating without the controls that production CI/CD systems have spent a decade building. Secret-scanning at paste time, sanctioned enterprise model endpoints, and clear policy on agent-skill installation are now baseline.

Four: update phishing simulation programmes to reflect LLM-personalised lures. Awareness training built around poor English, generic salutations and bulk-mailed bait is now training people for a threat that has largely retired. Simulations should include credible reconnaissance-driven lures that name specific colleagues, reference recent internal projects and use plausible UK English. If your provider cannot deliver this, change provider.

Five: bring AI-driven defensive tooling onto the roadmap, not the wishlist. Google’s Big Sleep agent identified a vulnerability that GTIG was able to disrupt before exploitation. CodeMender, also Gemini-based, is being used to patch critical vulnerabilities automatically. The UK equivalents from Microsoft, Snyk, GitHub and others are available now. Procurement that defers AI-assisted code analysis to 2027 is implicitly assuming attackers will also defer. They will not.

SME Advantage: Mid-sized UK firms can move faster than large enterprises on items two, three and five because they have less legacy AI tooling to inventory and shorter procurement cycles. The disadvantage is reverse on item one: smaller teams cannot patch faster without compensating controls. Web application firewalls and virtual patching become more important, not less, for the next 12 months.

Four hidden challenges most plans will miss

Strategic responses to GTIG’s findings tend to converge on the obvious controls. The harder problems are the second-order ones, and they are the ones that will determine whether UK organisations are still defensible in 18 months.

The detection telemetry problem. Endpoint detection systems are still tuned to score code by structural anomaly. LLM-generated decoy code, by design, looks structurally normal. Detection vendors will adapt, but the lag between attacker innovation and detection retuning is measured in quarters. Mitigation: weight behavioural telemetry (process trees, network egress patterns) more heavily than static code scoring during the transition.

The attribution problem under ORB networks. APT27’s fleet management tool routes traffic through residential IPs on 4G and 5G SIMs, with three-hop anonymisation defaults. Many UK organisations still rely on geographic and IP-reputation signals to triage. Those signals are degrading. Mitigation: shift weight to authentication anomalies, impossible-travel detection on actual credential use, and device-binding rather than IP-based trust.

The agent permission problem. PROMPTSPY’s architecture, and the broader pattern of agent skills with elevated permissions, exposes a control gap UK enterprises have not yet faced at scale. Production AI agents granted access to internal data stores, ticketing systems and code repositories have the same blast radius as a privileged service account, but rarely the same review process. Mitigation: treat every AI agent deployment as a privileged-access change, with named accountable owners and quarterly recertification.

The information operations spill-over. Operation Overload-style impersonation of UK journalists and brands lands inside the perimeter of corporate trust as fast as it lands inside political discourse. Customer-service teams, executive offices and PR functions need their own deepfake response playbooks. Mitigation: pre-agreed verification protocols for unusual senior-executive video and audio, and a relationship with at least one digital forensics provider before the incident happens, not after.

Hidden Cost: The cheapest item on this list, agent permission review, is the one most UK enterprises will defer. It is also the one where a single AI-agent compromise can cause data loss equivalent to a privileged credential breach, with none of the existing identity governance audit trail. Defer it last, not first.

The strategic takeaway

GTIG’s May 2026 report is not the inflection point. The inflection point was the moment foundation models became capable of reasoning about developer intent at production cost. The report is the moment that capability became visible in the wild against named targets. UK security leaders who treat it as an annual threat-landscape document will draw the wrong conclusions. The right reading is that three baselines have moved at once: the speed of vulnerability discovery, the cost of malware iteration, and the surface area of the AI supply chain itself.

The organisations that come through the next 18 months in the strongest position will not be the ones with the largest security budgets. They will be the ones whose security teams, procurement functions and engineering leadership accept that attacker economics have changed and adjust controls, contracts and culture accordingly. Three habits matter more than any single tool: faster patch decisions, AI dependency inventories that match software dependency inventories, and defensive AI capability on the roadmap rather than the wishlist.

Take Action: Three concrete moves in the next 90 days: (1) commission a one-week AI supply chain audit covering every gateway library and agent framework already deployed; (2) reset critical-CVE SLAs for authentication-adjacent systems to seven days; (3) brief the audit committee on the GTIG report and the five priorities above, with named owners against each.

Where to start this quarter

  • Inventory AI gateway libraries (LiteLLM, BerriAI, langchain, MCP servers) in production
  • Verify build-environment secret scope and rotation for every repository using AI dependencies
  • Update phishing simulation contract to include LLM-personalised lure scenarios
  • Add deepfake response steps to the executive incident-response playbook
  • Reset patch SLAs for authentication, identity and admin-tool CVEs
  • Procure or pilot one AI-assisted vulnerability-discovery tool (Big Sleep equivalent)
  • Identify one named accountable owner for every production AI agent

About this analysis

This strategic analysis draws on Google Threat Intelligence Group’s May 2026 report on adversarial AI use, summarised on the Google blog. The original GTIG report covers vulnerability discovery, defence evasion, autonomous malware, AI-augmented research, obfuscated LLM access and supply chain attacks against AI components. Strategic framing, UK contextualisation, and the prioritised recommendations are Resultsense analysis intended for UK enterprise and public-sector security leaders.

Resultsense is making sense of AI in the UK. If your security, procurement or board team needs a structured conversation about the implications of this shift, our contact page is the place to start.