AI weather models still ‘underperform’ on record-breaking extremes, study finds
TL;DR: A Science Advances study tested three leading AI weather models — Google DeepMind’s GraphCast, Huawei’s Pangu-Weather and Shanghai’s Fuxi — against the European Centre for Medium-Range Weather Forecasts’ physics model on roughly 246,000 record-breaking heat, cold and wind events from 2018 and 2020. Across all three categories the AI models underestimated both frequency and intensity. Co-author Erich Fischer told Carbon Brief the result is a “warning shot” against replacing physics models too quickly.
The contradiction at the heart of the study is that the same AI models often beat physics on routine forecasts: two of the three matched or outperformed ECMWF on average 2020 weather. The gap opened only when researchers filtered for record events — the cases that drive billions in damages and trigger early-warning systems. The bigger the margin a record was broken by, the worse the AI underprediction.
Why this matters for forecasting strategy
Lead author Zhongwei Zhang said most operational AI weather models were trained on the full historical record, which weights them toward typical conditions. Record-shattering extremes — by definition outside the training distribution — are where the statistical-pattern approach degrades. Lower lead times showed the widest gap, complicating the case for AI as a near-term emergency forecasting tool.
The finding has direct implications for the UK Met Office, which last year began publicly trialling AI components alongside its Unified Model. Other independent researchers contacted by Carbon Brief, including Leonardo Olivetti at Uppsala University, said the result aligns with several earlier studies but is novel for its specific extremes focus. Major centres — including ECMWF itself — already use AI alongside physics models rather than replacing them.
Looking forward
For UK weather-dependent sectors — insurance, agriculture, infrastructure operators, flood-resilience teams — the practical takeaway is to be cautious about AI-only forecast products marketed for tail-risk decisions. The cost asymmetry is uncomfortable: AI models cost a fraction of physics models to run, which makes them attractive to commercial weather-as-a-service vendors targeting SMEs that previously could not afford bespoke forecasts. The Carbon Brief reporting suggests buyers should ask vendors for verification on extremes, not just on mean error metrics, before integrating AI forecasts into early-warning workflows.