QUICK REVIEW

[論文レビュー] Rolling-Origin Validation Reverses Model Rankings in Multi-Step PM10 Forecasting: XGBoost, SARIMA, and Persistence

Federico Garcia Crespi, Eduardo Yubero Funes|arXiv (Cornell University)|Mar 19, 2026

Air Quality Monitoring and Forecasting被引用数 0

ひとこと要約

この論文は、多段階の PM10 予測において、ロールアップ-origin 評価と持続性ベンチマークがモデルのランキングを変えることを示している。短期の horizon では XGBoost が持続性を下回ることがあり、SARIMA は 1–7 日の間で一貫して有用性を維持する。

ABSTRACT

(a) Many air quality forecasting studies report gains from machine learning, but evaluations often use static chronological splits and omit persistence baselines, so the operational added value under routine updating is unclear. (b) Using 2,350 daily PM10 observations from 2017 to 2024 at an urban background monitoring station in southern Europe, we compare XGBoost and SARIMA against persistence under a static split and a rolling-origin protocol with monthly updates. We report horizon-specific skill and the predictability horizon, defined as the maximum horizon with positive persistence-relative skill. Static evaluation suggests XGBoost performs well from one to seven days ahead, but rolling-origin evaluation reverses rankings: XGBoost is not consistently better than persistence at short and intermediate horizons, whereas SARIMA remains positively skilled across the full range. (c) For researchers, static splits can overstate operational usefulness and change rankings. For practitioners, rolling-origin, persistence-referenced skill profiles show which methods stay reliable at each lead time.

研究の動機と目的

展開を想定した時間的検証の下で PM10 予測モデルの実運用上の有用性を評価する。
rolling-origin 評価を用いて、持続性・SARIMA・XGBoost の 1–7 日 horizon を比較する。
horizon に依存した指標を通じて持続性ベンチマークに対する予測の有用性を定量化する。
predictability horizon を、リードタイム全体にわたる堅牢な技能の運用要約として導入する。

提案手法

スペイン、エルチェの都市背景局の日次 PM10 データ（2017–2024）を用いて、1–7 日先を予測する。
3 つの予測ファミリを比較する：持続性、SARIMA、XGBoost。naive、古典的、非線形手法を網羅。
静的分割とリーケージ回避の rolling-origin 検証（訓練のみ前処理）で評価。
horizon-specific RMSE/MAE および持続性比較指標 SSm(h)=1−Errm(h)/Errpers(h) を計算。
predictability horizon H* を、SSm(h) > 0 が成り立つ最大の h として定義し、運用上の有用性を要約する。

Figure 1: Two-panel diagnostic summary of the daily PM10 series used in the empirical case study. Panel (a) shows the full daily time series. Panel (b) summarizes monthly seasonality through the distribution of monthly mean and median concentrations across the annual cycle.

実験結果

リサーチクエスチョン

RQ1展開に近い条件で、静的な年代順スプリットから rolling-origin 評価へ移行するとモデルのランキングは変わるのか。
RQ2持続性・SARIMA・XGBoost は horizon 1–7 日の間、持続性に対して正の技能を維持するのか。
RQ3各モデルについて、予測有用性が無視できるとみなされる horizon（H*）はどこか。
RQ4rolling-origin 検証下で SARIMA と XGBoost の horizon ごとの技能プロファイルはどう異なるのか。

主な発見

静的評価では、XGBoost は 1–7 日の間に持続性を上回るように見えた（SS=0.231–0.299、H*=7）。
rolling-origin 評価では、短期の horizon ではしばしば XGBoost が持続性を上回らず（例: h=1: SS=−0.192; 多くの分割で非正）、長期の horizon（h=5–7: SS=0.067–0.137）でのみ正の技能を示した。
SARIMA はすべての horizon で平均技能を正に維持（h=1: SS=0.027; h=6: SS=0.203; h=7: SS=0.192）、rolling-origin 評価下で XGBoost を上回る。
展開に近い検証の下でのランキングは逆転する：SARIMA が 1–7 日で XGBoost を上回ることが多く、評価設計の重要性を浮き彫りにしている。
本研究は、正の持続性相対技能を示す最大 horizon を運用上の有用性の要約として導入・活用する（H*）。
結果は、評価設計がモデルの有用性の認識を実質的に変え、持続性が依然として強力な運用ベンチマークであることを示す。

Figure 2: Static-split H ∗ evaluation for XGBoost on the empirical case study. Under a single chronological train/test partition, XGBoost remains above persistence across all horizons ( $\mathrm{SS}=0.231$ – $0.299$ ), yielding a nominal $H^{*}=7$ . This result is informative as a single-split bench

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。