QUICK REVIEW

[論文レビュー] WeatherBench Probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models

Sagar Garg, Stephan Rasp|arXiv (Cornell University)|May 2, 2022

Hydrological Forecasting Using AI被引用数 23

ひとこと要約

論文は WeatherBench を確率的予測へ拡張し、MCドロップアウト、パラメトリック、カテゴリ予測アプローチを運用 IFS アンサンブルと比較し、確率的検証指標を用いて評価する。

ABSTRACT

WeatherBench is a benchmark dataset for medium-range weather forecasting of geopotential, temperature and precipitation, consisting of preprocessed data, predefined evaluation metrics and a number of baseline models. WeatherBench Probability extends this to probabilistic forecasting by adding a set of established probabilistic verification metrics (continuous ranked probability score, spread-skill ratio and rank histograms) and a state-of-the-art operational baseline using the ECWMF IFS ensemble forecast. In addition, we test three different probabilistic machine learning methods -- Monte Carlo dropout, parametric prediction and categorical prediction, in which the probability distribution is discretized. We find that plain Monte Carlo dropout severely underestimates uncertainty. The parametric and categorical models both produce fairly reliable forecasts of similar quality. The parametric models have fewer degrees of freedom while the categorical model is more flexible when it comes to predicting non-Gaussian distributions. None of the models are able to match the skill of the operational IFS model. We hope that this benchmark will enable other researchers to evaluate their probabilistic approaches.

研究の動機と目的

WeatherBench ベンチマークを中期予報の確率的予測に拡張する。
予測の信頼性とシャープネスを評価するための確率的検証指標を導入する。
深層学習ベースの確率モデル（MCドロップアウト、パラメトリック、カテゴリカル）を運用アンサンブルのベースラインと評価する。

提案手法

ERA5 データから導出された114入力チャネルを用いた深い ResNet ベースのアーキテクチャ（7レベル分の変数と地表場を含む）。
3つのアプローチで確率予測を生成する：モンテカルロドロップアウト（ドロップアウト率を変化させてアンサンブルを作成）、パラメトリック予測（Z500のガウス分布、T850、T2MでCRPS損失）、カテゴリ予測（ソフトマックスとクロスエントロピーを用いた離散ビン）...
3日先予報の評価を確率指標で行う：CRPS、スプレッド-スキル比、ランクヒストグラム、加えてエンセmbles平均の決定的RMSE。

実験結果

リサーチクエスチョン

RQ1地勢高分解能での3日予報において、確率的ニューラルネットワークアプローチは運用IFSアンサンブルとどう比較されるか（ジオポテンシャル、温度、降水量）？
RQ2MCドロップアウト、パラメトリック、カテゴリ確率予測の信頼性と較正特性（スプレッド-スキル、CRPS、ランクヒストグラム）は？
RQ3パラメトリックまたはカテゴリモデルは降水量のような非ガウス分布をモデリングする利点を提供するか？

主な発見

Model	Z500 RMSE	Z500 Spread	Z500 CRPS	T850 RMSE	T850 Spread	T850 CRPS	T2M RMSE	T2M Spread	T2M CRPS	TP RMSE	TP Spread	TP CRPS
MC Dropout Dr=0.1	312.96	1.80	1.52	155.70	1.03	0.77	0.57	[missing]	[missing]	[missing]	[missing]	[missing]
Parametric	315.30	1.82	1.55	142.67	0.90	0.70	0. -	0. -	0. -	[missing]	[missing]	[missing]
Categorical	327.48	1.80	1.49	142.59	0.87	0.65	0.47	[missing]	[missing]	[missing]	[missing]	[missing]
TIGGE (3/5 days)	145/297	1.20/1.73	1.26/1.57	2.02/2.15	1.05/1.00	0.93/0.96	0.69/0.80	0.84/0.85	0.65/0.0	0.58/0.70	0.41/0.47	[missing]
Deterministic	313.70	1.79	1.53	194.90	1.24	0.96	0. -	0. -	0. -	[missing]	[missing]	[missing]

MCドロップアウトはドロップアウト率0.1でエ ensemble平均RMSEとCRPSが最も低いが、スプレッドが著しく過小分散である。
パラメトリックおよびカテゴリモデルは検証スコアは概ね同等だが、トレードオフあり：パラメトリックは自由度が少なく単純、カテゴリは非ガウス分布（特に降水量）をより適切に扱える。
確率的ニューラルネットワークのいずれも運用IIS TIGGE/IFSアンサンブルのスキルには及ばず、降水量はデータ駆動法にとって依然課題。
後処理なしの決定論的DLベースラインは多くの指標で確率的方法より劣る。
運用の TIGGE アンサンブルは一般にデータ駆動法を上回るが、降水量では RMSE が情報量としては不十分な例外あり。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。