QUICK REVIEW

[논문 리뷰] WeatherBench 2: A benchmark for the next generation of data-driven global weather models

Stephan Rasp, Stephan Hoyer|arXiv (Cornell University)|2023. 08. 29.

Meteorological Phenomena and Simulations인용 수 29

한 줄 요약

WeatherBench 2는 최신 벤치마크에 더 높은 해상도 데이터, 새로운 지표, 데이터 기반 글로벌 기상 예보를 평가하기 위한 오픈 소스 프레임워크를 추가하여 최첨단 baselines와 비교합니다.

ABSTRACT

WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting.

연구 동기 및 목표

데이터 기반 글로벌 기상 예보를 더 높은 해상도에서 평가하기 위한 개방적이고 확장 가능한 평가 프레임워크를 제공합니다.
ECMWF/WMO 관행에 맞춘 헤드라인 검증 점수 세트를 정의합니다.
공유된 지상 진실값과 평가 도구를 사용하여 전통적인 물리 기반 예보와 AI/ML 모델 간 공정한 비교를 가능하게 합니다.
확률적이고 데이터 기반 기상 예보의 한계, 문제점 및 향후 방향을 강조합니다.

제안 방법

WeatherBench 2를 이끄는 설계 원칙과 WB1과의 차이점을 설명합니다.
WMO/ECMWF 관행에 맞춘 평가 프로토콜과 지표(RMSE, ACC, Bias, SEEPS, CRPS)를 정의합니다.
지속적으로 업데이트되는 사이트로 공개 소스 지상 진실 데이터, 학습 데이터, 기준선, 평가 코드를 게시합니다.
운영 및 데이터 기반 모델의 다수 기준선(ERA5, IFS HRES/ENS, Keisler GraphNet, Pangu-Weather, GraphCast, FuXi, SphericalCNN, NeuralGCM)을 제공합니다.
평가를 위한 데이터 처리 선택(지상 진실 ERA5, 평가를 위한 1.5° 재격자화, 지면 아래 영역 마스킹)을 논의합니다.
데이터 기반 예보에 대한 확률적 평가 프레임워크와 엔셀블-유사 접근 방식을 지원합니다.

Figure 1: Deterministic headline scorecards for upper-level variables. Values show absolute RMSE. Colors denote % difference to the IFS HRES baseline.

실험 결과

연구 질문

RQ1공유된 오픈 평가 프레임워크를 사용하여 데이터 기반 글로벌 기상 모델을 운영 NWP 기준선과 공정하게 평가하려면 어떻게 해야 합니까?
RQ21–14일 예측 기간에 대해 결정론적 및 확률적 예측 모두에서 성능을 요약하는 헤드라인 점수는 무엇이 가장 좋습니까?
RQ3ERA5 지상 진실값과 운영 분석으로 ML 기반 기상 예보를 평가할 때의 주의점과 한계는 무엇입니까?
RQ4다양한 지표와 변수에서 고해상도 데이터 기반 모델이 전통적인 IFS 기반 예보와 어떻게 비교됩니까?

주요 결과

WeatherBench 2는 오픈 소스 평가 프레임워크, 데이터 세트, 기준선, 최신 지표 및 모델을 포함한 지속적으로 업데이트되는 웹사이트를 제공합니다.
평가 프로토콜은 WMO/ECMWF 검증 관행을 면밀히 따르며 광범위한 모델 비교를 위한 정의된 헤드라인 점수 세트를 제공합니다.
벤치마크는 GraphCast, Pangu-Weather, FuXi, SphericalCNN, NeuralGCM 등 최첨단 데이터 기반 모델과 전통적 기준선(ERA5, IFS HRES/ENS)의 범위를 포함합니다.
예보는 표준화된 입력 및 해상도에서 실행되며 상호 모델 간 공정한 비교를 보장하기 위해 1.5°로 재격자화됩니다.
프레임워크는 확률적 예보와 엔셀블-유사 평가를 강조하여 기상 예측의 불확실성을 반영합니다.

Figure 2: Deterministic headline scorecards for surface variables. Values show absolute RMSE, with the exception of precipitation which shows SEEPS (evaluated against ERA5 in all cases). Colors denote % difference to the IFS HRES baseline.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.