QUICK REVIEW

[論文レビュー] Evaluating Forecasts with scoringutils in R

Nikos I Bosse, Hugo Gruson|arXiv (Cornell University)|May 14, 2022

Forecasting Techniques and Applications被引用数 26

ひとこと要約

この論文は R で scoringutils パッケージを用いて予測を評価する方法を示しており、スコアの要約、カバレッジ推定、サンプルベースの予測を分位点ベースの形式へ変換することを含む。さらに、スコアを集計する際の注意点と校正指標の解釈についても論じている。

ABSTRACT

Evaluating forecasts is essential to understand and improve forecasting and make forecasts useful to decision makers. A variety of R packages provide a broad variety of scoring rules, visualisations and diagnostic tools. One particular challenge, which scoringutils aims to address, is handling the complexity of evaluating and comparing forecasts from several forecasters across multiple dimensions such as time, space, and different types of targets. scoringutils extends the existing landscape by offering a convenient and flexible data.table-based framework for evaluating and comparing probabilistic forecasts (forecasts represented by a full predictive distribution). Notably, scoringutils is the first package to offer extensive support for probabilistic forecasts in the form of predictive quantiles, a format that is currently used by several infectious disease Forecast Hubs. The package is easily extendable, meaning that users can supply their own scoring rules or extend existing classes to handle new types of forecasts. scoringutils provides broad functionality to check the data and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The paper presents the package and its core functionality and illustrates common workflows using example data of forecasts for COVID-19 cases and deaths submitted to the European COVID-19 Forecast Hub.

研究の動機と目的

scoringutils を R の予測評価に使用する方法を示す。
予測スコアを要約し、モデルやターゲットタイプ全体で可視化する方法を示す。
経験的予測区間を用いた校正指標とカバレッジに関する指針を提供する。
異なる予測形式（分位点ベース vs サンプルベース）に対するデータ準備手順を示す。

提案手法

summarise_scores を用いてモデルとターゲットタイプ別に予測スコアを集約・表示する。
add_coverage を適用して中心予測区間（例：50% または 90%）の経験的カバレッジを定量化する。
sample_to_quantile を用いてサンプルベースの予測を分位点ベースの形式へ変換し、score() と add_coverage() を可能にする。
plotscoretable と by/grouping オプション（例：by targettype）を用いたプロットを実演する。
大きなマグニチュードのターゲットが支配することを避けるため、異種の予測タイプや horizon を横断してスコアを集計する際の注意点を強調する。
校正の代理として経験的カバレッジの使用と、care を要する他の要約関数（例：signif）の役割を指摘する。

実験結果

リサーチクエスチョン

RQ1scoringutils を用いてモデルとターゲットタイプの間で予測評価指標を計算・可視化するにはどうすればよいか。
RQ2分位点ベースとサンプルベースの異なる予測形式に対して適切な集計と可視化戦略は何か。
RQ3経験的カバレッジ指標は予測区間の校正をどのように反映し、スコア表に追加できるか。
RQ4異種の予測ターゲットや horizon を横断してスコアを集計する際に必要な注意点は何か。

主な発見

scoringutils は summarise_scores や plotscoretable のような機能を用いて予測スコアを要約・可視化できる。
add_coverage は中心予測区間（例：50% または 90%）の経験的カバレッジ推定を提供する。
sample_to_quantile はサンプルベースの予測をスコアリングとカバレッジ分析に適した分位点ベースの形式へ変換できる。
異種のターゲットや horizon を横断してスコアを集計すると集計が支配され誤解を招く可能性があるため、相対的または層別分析が推奨される。
例は、多量度（例：cases, deaths）およびマルチモデルのシナリオを示し、モデル別およびターゲットタイプ別の要約を含む結果を示す。
ユーザーは非平均要約関数を用いる際には注意が必要であり、特定の集計下でスコアの適切性が損なわれる可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。