Harvard trial: AI beats ER doctors on text-based triage; 31% of UK doctors now use AI weekly
TL;DR: A Harvard Medical School and Beth Israel Deaconess study published in Science gave OpenAI’s o1 reasoning model and pairs of human ER doctors the same standard electronic health records for 76 patients arriving at a Boston emergency department. The AI matched the exact or near-exact diagnosis 67% of the time; human doctor pairs hit 50-55%. With richer detail, o1 reached 82% (humans: 70-79%, not statistically significant). On longer-term treatment plans across five clinical case studies, the AI scored 89% versus 34% for 46 doctors using conventional resources like search engines.
A second-opinion role, not a replacement
Lead author Arjun Manrai stressed the trial only tested data communicable in text — vital signs, demographics, a nurse’s freetext note. Visual cues, patient distress signals and the rest of bedside clinical judgement were not in scope. Co-author Dr Adam Rodman framed the next decade as a “triadic care model … the doctor, the patient, and an artificial intelligence system” rather than a substitution. The model also caught a clinical signal humans missed: in one case, a patient’s lupus history flagged inflammation as the source of worsening lung symptoms, where doctors had assumed anti-coagulant failure.
Looking forward
For UK readers, the relevant context is uptake data: a recent Royal College of Physicians survey found 16% of UK doctors using AI daily and a further 15% weekly, with clinical decision-making among the most common uses. The RCP survey also flagged AI error and liability as the dominant concerns — a gap Rodman conceded directly, noting “there is not a formal framework right now for accountability”. University of Edinburgh’s Ewen Harrison called the Harvard results important but framed the tools as “second-opinion” supports rather than autonomous decision-makers, while Sheffield’s Wei Xing warned of unconscious deference: doctors going with the AI rather than thinking independently. For NHS Trusts running AI-scribe and clinical decision-support pilots, the operational question is now which workflows benefit from a second-opinion AI under what auditing and override controls — and how training will change to keep human clinical reasoning sharp when the tool is on by default.