QUICK REVIEW

[Paper Review] "What is a realistic forecast?" Assessing data-driven weather forecasts, a journey from verification to falsification

Zied Ben Bouallègue|arXiv (Cornell University)|Jan 31, 2026

Meteorological Phenomena and Simulations0 citations

TL;DR

The paper defines three types of realism for data-driven weather forecasts—functional, structural, and physical—and outlines a verification-to-falsification workflow to assess realism and trust in ML-based forecasts.

ABSTRACT

The artificial intelligence revolution is fueling a paradigm shift in weather forecasting: forecasts are generated with machine learning models trained on large datasets rather than with physics-based numerical models that solve partial differential equations. This new approach proved successful in improving forecast performance as measured with standard verification metrics such as the root mean squared error. At the same time, the realism of data-driven weather forecasts is often questioned and considered as an Achilles' heel of machine learning models. How 'forecast realism' can be defined and how this forecast attribute can be assessed are the two questions simultaneously addressed here. Inspired by the seminal work of Murphy (1993) on the definition of 'forecast goodness', we identify 3 types of realism and discuss methodological paths for their assessment. In this framework, falsification arises as a complementary process to verification and diagnostics when assessing data-driven weather models.

Motivation & Objective

Clarify how forecast realism should be defined for data-driven weather forecasts.
Identify and categorize the three realism types: functional, structural, and physical realism.
Propose a workflow moving from verification through diagnostics to falsification to assess realism and trust.

Proposed method

Define three types of realism (functional, structural, physical) and discuss how each can be measured or assessed.
Describe the relationships between realism types and verification/diagnostics.
Introduce a falsification framework that tests forecasts against physical knowledge to ensure plausibility.
Outline a typical data-driven forecast evaluation journey involving verification, diagnostics, and falsification.
Relate realism concepts to forecast value, model validation, and interpretability.

Experimental results

Research questions

RQ1How can forecast realism be defined for data-driven weather forecasts?
RQ2How can functional, structural, and physical realism be measured and diagnosed in practice?
RQ3What role does falsification play alongside verification in assessing ML-based forecasts?
RQ4How should a typical data-driven forecast evaluation journey be structured to answer ‘Is the forecast realistic?’

Key findings

Functional realism is assessed via scoring rules that measure distance between forecast and observation.
Structural realism is assessed via diagnostic measures comparing forecast statistics to observations (e.g., bias, variability).
Physical realism is assessed through falsification tests comparing forecasts against established knowledge of physics and potential model artifacts.
A three-step evaluation journey—verification, diagnostics, and falsification—provides a comprehensive realism assessment.
Post-processing and information content relate functional realism to forecast value and decision usefulness.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.