The AI accountability gap: what UK procurers must demand from vendors

TL;DR: A Lighthouse Reports investigation, peer-reviewed by academics at Kent, Georgetown and Delft, has reconstructed the machine learning model Kenya uses to set health insurance premiums and shown it was designed in a way that systematically overcharges the poorest and undercharges the wealthiest. The deeper finding is not Kenyan: it is the playbook of “the algorithm decided” being used to dissolve institutional accountability. UK enterprise and public-sector buyers face the same playbook from the other side of the contract. The defence is contractual and procedural, not technical. Procurers who write seven specific demands into every AI vendor agreement (model cards, training data provenance, error-rate disclosures by subgroup, recourse mechanisms, audit rights, sunset clauses, and named accountable parties) shift the cost of opacity back onto the vendor. Those who do not will pay the cost themselves, in legal exposure, regulatory scrutiny and the slow erosion of public trust.

When “the algorithm decided” stops being an answer

In Nairobi, a Community Health Promoter named Grace sits in homes and asks people what their roof is made of, what kind of toilet they use, whether they own a radio. She types the answers into a form. A number comes back: the sum the household must pay that year for public health insurance. No human at the Social Health Authority chose that number. A machine learning model, trained on a 2020 household survey, predicted the household’s income from those indicators and set a percentage as the premium.

When Lighthouse Reports rebuilt the model and tested it under peer review at three universities, the verdict was unambiguous. The system, by design, overcharges those least able to pay and undercharges those most able. The investigation’s title is not metaphor. Institutions are hiding behind AI, and the accountability gap that opens up between vendor, buyer and citizen is now the defining AI risk for any organisation procuring automated decision systems.

Strategic Reality: The Kenyan case is striking because the harm is acute and visible. The same architecture (a vendor model, opaque indicators, automated decisions, denied appeals) is being installed across UK welfare, healthcare allocation, fraud detection and HR screening. The pattern is the procurement risk, not the geography.

The pattern that should worry UK procurers

Kenya’s Social Health Authority is the latest entry in a list that is no longer short. The Netherlands deployed SyRI to flag welfare fraud and was forced by a court to withdraw it on human-rights grounds. Australia’s Robodebt scheme used automated income-averaging to issue debt notices, recovered hundreds of millions of pounds in wrongful claims, and ended in a Royal Commission. The UK’s Post Office Horizon scandal (not strictly AI, but identical in structure) saw sub-postmasters prosecuted on the basis of computer evidence that nobody in the procurement chain was equipped to challenge. The DWP’s own fraud and error machine learning models, deployed on Universal Credit claims, have already been flagged by the National Audit Office for opacity and disproportionate impact on disabled claimants.

The Real Story is not that AI is uniquely dangerous. It is that automated decisions create a structural diffusion of responsibility that institutions find convenient. The vendor blames the buyer’s data. The buyer blames the vendor’s model. The model itself blames nobody, because it cannot. The citizen, meanwhile, faces a number they did not choose, from a process they cannot interrogate, with appeal mechanisms that are themselves automated.

Critical numbers from the accountability patternValue
Households flagged by Netherlands SyRI before court ruling~240,000
Australian Robodebt wrongful debt total clawed backA$1.8 billion
UK Post Office Horizon prosecutions (now overturned or under review)900+
DWP Universal Credit fraud-detection algorithm disparity flagged by NAO (2024)26 protected characteristics with statistically significant disparate impact
Kenyan households surveyed in SHA training data (2020)24,000
Lighthouse model peer-review institutions3 (Kent, Georgetown, Delft)

Critical Context: None of the above failures was caused by an exotic technical flaw. Each was caused by a procurement and governance gap the buyer did not close.

What is really happening: three structural mechanisms

The accountability gap is not an accident of immature technology. It is the product of three mechanisms that any procurement contract can either entrench or close.

Information asymmetry. The vendor holds the model, the training data, the indicator weights and the error-rate distributions. The buyer typically holds none of these. When something goes wrong, the buyer has no way to investigate without the vendor’s cooperation. Kenya’s Social Health Authority published the headline formula but withheld the indicator details and the training set. Lighthouse only obtained them under pressure from the Ombudsman. Most UK buyers do not have an Ombudsman to apply that pressure.

Diffuse responsibility. A vendor’s standard contract places the responsibility for “data quality” and “appropriate use” on the buyer. The buyer’s standard procurement template places responsibility for “model performance” on the vendor. When a citizen is harmed, both contracts can be invoked to deflect liability, and neither party has incentive to investigate. The harm becomes a contract dispute rather than a remediation programme.

Opacity by design. Proprietary models are protected as trade secrets. Buyers sign non-disclosure agreements that prevent them from publishing audit findings. Citizens cannot subpoena the weights. Even when bias is suspected, the model itself becomes the legal obstacle to investigating the bias. This is not a bug in the system. It is the system, working as the vendor’s lawyers designed it.

Hidden Cost: Every NDA a buyer signs about a citizen-facing model becomes evidence in a future judicial review. UK courts have already shown willingness to compel disclosure on public-law grounds. Buyers who promised confidentiality they cannot keep face the worst of both contracts.

The Kenyan implementation reality

Lighthouse’s peer-reviewed replication revealed two specific failure modes that any UK buyer should recognise. First, the training data (a 2020 household survey) did not reflect the cost-of-living shock between 2020 and 2024, so the income predictions were systematically out of date for the people whose lives had changed most. Second, a previously unseen consultants’ report had proposed adjustments to fix the worst regressivity, and the adjustments were not implemented. Lighthouse implemented them in their replica model and showed the system would have been measurably less harmful.

That second finding is the one that should stop a UK procurer mid-signature. The vendor knew. The advice existed. The implementation did not happen. Without an external replication, this would never have surfaced. UK buyers cannot rely on Lighthouse-grade investigations to catch every vendor that took the same shortcut.

The human factor: asymmetric harm and the stakeholder problem

The defining feature of the accountability gap is that its costs do not fall evenly. In Kenya, the people overcharged are the people least able to challenge the charge: a woman washing other people’s clothes for a living, a pregnant mother in Huruma whose appeal was denied without reason. In the UK, the disparate-impact analyses repeatedly identify the same populations: disabled claimants, low-income households, ethnic minorities, people whose circumstances the training data underrepresents.

StakeholderWhat they seeWhat they cannot seeWhat they can do
CitizenA number, a denied appealIndicators, weights, error rates, recourse pathAlmost nothing alone
Frontline staffA form output, a scriptThe logic behind the outputRefer up; not override
Procurement officerA vendor demo, a SOC 2 reportSubgroup performance, training-data provenanceAsk better questions, before signing
Senior responsible ownerA KPI dashboardThe error distribution under the dashboardDemand audit rights and recourse design
RegulatorComplaints data, FOI responsesThe model itselfCompel disclosure, fine, suspend
VendorThe full system(Has visibility)Disclose, refuse, charge for transparency

The asymmetry is not a side effect. It is the cost structure of opaque AI procurement. The lower a stakeholder sits in the table, the higher the cost they bear when the model is wrong. Closing the accountability gap means rebuilding the procurement contract so that visibility tracks responsibility, not the other way round.

Strategic Insight: Procurement is the highest-leverage point for AI accountability. Once the contract is signed, every later intervention (regulator action, judicial review, parliamentary inquiry) is reactive and expensive. The questions a procurer asks before signing cost orders of magnitude less than the answers a court will eventually compel.

Seven demands every UK procurer should write into AI contracts

The remediation is not technical philosophy. It is contract clauses. The following seven demands form a baseline that any UK enterprise or public-sector buyer can request without needing in-house ML expertise. Vendors who refuse them are telling the buyer something material about the risk.

1. A current model card with subgroup performance. Not a marketing one-pager. A model card that lists training data provenance, indicator definitions, error rates by protected characteristic where applicable, and known failure modes. If the vendor cannot produce one, they have not measured. If they have not measured, the buyer has no basis to deploy.

2. Training data provenance and recency. Date of collection, source, sample size, and a statement of representativeness for the deployment population. Kenya’s 2020 survey deployed in 2024 is not an exotic failure. It is the default behaviour of any model whose training data is not refreshed.

3. Right to independent replication. A clause permitting a named third party (academic, civil society, or government auditor) to replicate the model under NDA. Lighthouse did this in Kenya through FOI pressure. UK buyers should procure the right up front.

4. Recourse mechanism with human review. Not an automated appeal line. A documented process where a flesh-and-blood reviewer can examine the inputs, override the output, and explain the decision. The vendor must commit to a defined response time. The buyer must staff and fund the review function.

5. Sunset clause and re-validation cadence. Models drift. Populations change. Every contract should specify a maximum period (typically 24 to 36 months) after which the model must be re-validated against current data, or it ceases to be authorised for use. Without this, models stay in production until they cause a scandal.

6. Named accountable parties. A senior individual at the vendor and a senior responsible owner at the buyer, both named in the contract, both personally accountable for incidents. “The algorithm decided” does not survive contact with a named SRO.

7. Disclosure pre-commitment. A contractual commitment that, in the event of a disparate-impact finding by an Ombudsman, regulator or court, the vendor will disclose the relevant model artefacts on a defined timetable. This converts a future legal fight into a contracted obligation, and it is the single clause vendors most resist.

Priority actions by organisational maturity

Early-stage AI buyers (no production AI yet). Adopt the seven demands as a procurement template before the first contract is signed. The cost of doing this in a greenfield is minimal. The cost of retrofitting it after a vendor relationship is established is enormous.

Mid-stage AI buyers (one to five production systems). Audit existing contracts against the seven demands. Identify the gaps. Open renegotiation conversations using the disparate-impact precedent of SyRI, Robodebt and Horizon as the business case. Most vendors will concede on three or four of the seven; few will concede on all seven, and the resistance is itself diagnostic.

Mature AI buyers (extensive deployment). Stand up an internal model registry that tracks every production model against the seven demands. Treat any model below a defined threshold as a remediation backlog. Report annually to the board on the trajectory.

Take Action: Write the seven demands into your standard AI procurement template this quarter. The cost is one legal review. The benefit is every contract you sign for the next decade.

Four hidden challenges that catch buyers out

The replication crisis comes for algorithms. Academic ML has known for years that published model performance often fails to replicate. The same is true of vendor performance claims. Kenya’s vendor presumably believed their own metrics. Lighthouse’s replication demolished them. A buyer who has not built or contracted replication capability is taking the vendor’s word for the most consequential number in the contract.

Indicator drift looks like model drift but is not. When a model’s accuracy degrades, the instinct is to retrain. But sometimes the indicators themselves have lost meaning (radio ownership, in 2026, signals something different than it did in 2020). Retraining on the same indicators perpetuates the original error. The seven demands include indicator review for this reason.

Recourse design is harder than it looks. A human-review process that is staffed, funded, advertised to citizens, and resourced to reverse decisions costs real money. Vendors that promise “human-in-the-loop” without specifying staffing, training and authority are selling a phrase, not a function. The recourse mechanism is the most common point of contractual theatre.

Procurement frameworks lag the technology. UK public procurement’s AI standards (G-Cloud, the Algorithmic Transparency Recording Standard) are improving but remain voluntary in many sectors. Enterprise procurement frameworks are typically further behind. Buyers who wait for the framework to catch up will procure the next decade’s scandals on the previous decade’s templates.

Reality Check: The seven demands do not require the buyer to understand the model. They require the buyer to insist on the conditions under which understanding is possible. That distinction is the entire procurement leverage point.

The takeaway: buyers can change vendor behaviour, no one else can

Regulators move slowly. Courts intervene reactively. Civil-society replication, when it happens, takes years and depends on FOI and Ombudsman pressure that most jurisdictions do not provide reliably. The only stakeholder positioned to close the accountability gap at scale, before harm occurs, is the procurer signing the cheque.

Three success factors decide whether a UK buyer closes the gap or inherits it.

First, treat AI procurement as a governance problem, not a technology problem. The questions the seven demands ask are about contracts, audits and recourse, not about hyperparameters. Procurement, legal and risk own this work. Data science supports.

Second, normalise replication. A buyer who has never commissioned an independent replication of a vendor model has not yet bought what they think they bought. Building the muscle once (academic partnership, civil-society contract, internal red team) means every future procurement starts from a stronger position.

Third, write the disclosure pre-commitment in. The most uncomfortable clause is also the most important. A vendor who will not commit to disclose under regulatory pressure is a vendor whose model the buyer cannot defend in public. That is the contract to walk away from.

Procurer’s pre-signature checklist:

  • Vendor has produced a current model card with subgroup performance
  • Training data provenance, date and representativeness are documented
  • Independent replication right is named and exercisable
  • Human-review recourse mechanism is staffed, funded, and time-bound
  • Sunset and re-validation cadence is specified (≤36 months)
  • Senior responsible owner and vendor counterpart are named
  • Disclosure pre-commitment for regulator or court findings is signed

Hiding behind AI works only as long as the buyer lets it. The Lighthouse investigation in Kenya is a warning to procurers everywhere: the accountability gap is a procurement choice, not a technology constraint. The cost of closing it before signing is a legal review. The cost of closing it afterwards is whatever the next public inquiry decides.

Strategic Reality: If your standard AI procurement template does not contain at least the seven demands above, you are not procuring an AI system. You are procuring a future investigation.

Source citation and attribution

Primary source: Lighthouse Reports, Hiding Behind AI: How Kenya Used a Predictive Algorithm to Transfer Health Costs onto the Poorest (4 May 2026). Investigation co-published with Africa Uncensored and The Guardian. Methodology peer-reviewed by academics at the University of Kent, Georgetown University and Delft University of Technology. Read the full investigation: lighthousereports.com/investigation/hiding-behind-ai/.

Comparator cases referenced: Netherlands SyRI (2020 Hague District Court ruling), Australia Robodebt Royal Commission (2023), UK Post Office Horizon inquiry, UK National Audit Office report on DWP Universal Credit fraud detection (2024).

About Resultsense: Resultsense is a UK-focused publication and advisory practice on the practical application of AI in business and the public sector. We work with procurement, risk and senior leadership teams to translate AI-policy and procurement standards into operational practice. For procurement-template review or accountability-gap audits, contact us via the contact page.