QUICK REVIEW

[論文レビュー] Seeking SOTA: Time-Series Forecasting Must Adopt Taxonomy-Specific Evaluation to Dispel Illusory Gains

Raeid Saqur, Christoph Bergmeir|arXiv (Cornell University)|Mar 16, 2026

Time Series Analysis and Forecasting被引用数 0

ひとこと要約

要約: 本論文は現在の長期時系列ベンチマークが周期的パターンを強調し、単純なベースラインよりも複雑なモデルを有利にする、 taxonomy 固有の評価と頑健な古典的ベースラインの導入を提唱して実際の進展を明らかにするべきだと主張する。

ABSTRACT

We argue that the current practice of evaluating AI/ML time-series forecasting models, predominantly on benchmarks characterized by strong, persistent periodicities and seasonalities, obscures real progress by overlooking the performance of efficient classical methods. We demonstrate that these "standard" datasets often exhibit dominant autocorrelation patterns and seasonal cycles that can be effectively captured by simpler linear or statistical models, rendering complex deep learning architectures frequently no more performant than their classical counterparts for these specific data characteristics, and raising questions as to whether any marginal improvements justify the significant increase in computational overhead and model complexity. We call on the community to (I) retire or substantially augment current benchmarks with datasets exhibiting a wider spectrum of non-stationarities, such as structural breaks, time-varying volatility, and concept drift, and less predictable dynamics drawn from diverse real-world domains, and (II) require every deep learning submission to include robust classical and simple baselines, appropriately chosen for the specific characteristics of the downstream tasks' time series. By doing so, we will help ensure that reported gains reflect genuine scientific methodological advances rather than artifacts of benchmark selection favoring models adept at learning repetitive patterns.

研究の動機と目的

標準的な LTSF ベンチマークは強い周期性に支配されており、単純なモデルで捉えられることを強調する。
多様な領域の非定常性と非周期的ダイナミクスを取り入れたベンチマークの廃止または補完を主張する。
データ特性に合わせた頑健な古典的ベースラインを、深層学習の LTSF 提出物に必ず含めることを提唱する。
真の方法論的進歩を反映する評価プロトコルを求め、ベンチマーク Artefact による利得を回避する。

提案手法

現在の LTSF ベンチマークデータセットと評価実践のレビューと批評。
強い周期性が単純または軽量なモデルを複雑なトランスフォーマーと同等または優位にする方法の分析。
統計的原理（例：シュタインのパラドックス）を用いて異種の時系列間の集約効果を説明。
taxonomy 固有の評価と、提出物へ日常的に安価な古典的ベースラインを含める提案。

実験結果

リサーチクエスチョン

RQ1現在の LTSF ベンチマークは実世界の非定常性と周期データを超えたモデルの一般化を反映しているか？
RQ2単純な古典的モデルは現在の LTSF ベンチマークで最先端の深層学習法と多様な領域で競えるか？
RQ3報告された利得が真の方法論的進歩を反映するような評価慣行とは何か？
RQ4TSF ベンチマークをどのように再設計して taxonomy 固有の非定常性を組み込み、公正なベースラインを設定すべきか？

主な発見

標準的な LTSF データセットには強く持続的な周期性が存在し、線形または統計モデルで効果的に捉えられる。
これらのデータセットの多くで、深層学習の TSF モデルは古典的なベースラインに対して僅かな改善しか示さず、複雑さの価値を疑問視させる。
LTSF-Linear のような単純なモデルが9つの標準 LTSF データセットで最先端のトランスフォーマーを上回ることがあり、ベンチマーク主導の進歩の錯覚を浮き彫りにする。
異質な系列を横断した集計指標は、タスク全体の専門家よりもグローバルな一般化能力を好む可能性があり、個々のタスクでの性能低下を覆い隠すことがある。
非定常性のより広いスペクトル（構造的ブレーク、時変ボラティリティ、概念ドリフトなど）をカバーするベンチマークの必要性と、提出物に頑健なベースラインを求める必要性。
評価指標は慎重に選択され、複数の指標を報告し、それらの限界を認識して偏ったランキングを避けるべきである。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。