TL;DR
University of Chicago law professor Eric Posner and researcher Shivam Saran found that GPT-5 applied the correct legal outcome in 100% of test cases, compared to 52% for US federal judges. However, the researchers argue that human judges’ ability to depart from strict rules may be a feature rather than a flaw.
The Test
In a paper titled “Silicon Formalism: Rules, Standards, and Judge AI,” Posner and Saran used GPT-5 to replicate a study originally conducted with 61 US federal judges. The judges were asked to determine which state law would apply in car accident scenarios — a less dramatic setting than the researchers’ earlier GPT-4o war crimes trial experiment, but useful for measuring strict legal adherence.
GPT-5 showed no evidence of hallucination or logical errors in its legal reasoning. “We find the LLM to be perfectly formalistic, applying the legally correct outcome in 100 percent of cases; this was significantly higher than judges, who followed the law a mere 52 percent of the time,” the researchers wrote.
Google’s Gemini 3 Pro also scored 100%. Other models performed worse: Gemini 2.5 Pro (92%), o4-mini (79%), Llama 4 Maverick (75%), Llama 4 Scout (50%), and GPT-4.1 (50%).
Following Rules Versus Doing Justice
The gap does not necessarily mean judges are worse at their jobs. When the applicable legal doctrine is a standard rather than a hard rule, judges have discretion in how they interpret it. They can weigh moral, social, and policy factors that a strict reading might miss.
“The apparent weakness of human judges is actually a strength,” Posner and Saran argued. “Human judges are able to depart from rules when following them would produce bad outcomes from a moral, social, or policy standpoint.”
The researchers noted that AI models consistently trend toward formalism — applying rules exactly as written. This raises an uncomfortable question: would society accept AI judgements that punish sympathetic defendants simply because the letter of the law demands it?
Looking Forward
As AI sees more use in legal work, the distinction between rule-following and justice will become increasingly important. The research suggests AI could be a valuable tool for legal analysis and consistency checking, but replacing human judicial discretion entirely remains a different proposition. Given that AI models can be tuned toward specific outcomes through training, the question of what constitutes the “right” settings for dispensing justice has no easy answer.