QUICK REVIEW

[論文レビュー] Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

Vijay Ekambaram, Arindam Jati|arXiv (Cornell University)|Jan 8, 2024

Stock Market Forecasting Methods被引用数 11

ひとこと要約

TTMsは、ゼロ/少数ショットの多変量時系列予測のための高速で小型の事前学習モデル（≤1M parameters）で、適応パッチング、ダウンサンプリング拡張、解像度プレフィックスチューニングを用いた未知データへの転移を強力に行い、LLMベースTSモデルと比較して計算量を大幅に削減しつつ、パフォーマンスを発揮します。

ABSTRACT

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. The model weights for reproducibility and research use are available at https://huggingface.co/ibm/ttm-research-r2/, while enterprise-use weights under the Apache license can be accessed as follows: the initial TTM-Q variant at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1, and the latest variants (TTM-B, TTM-E, TTM-A) weights are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2.

研究の動機と目的

公開データの事前学習が不足かつ多様である状況で、ゼロ/少数ショットの多変量時系列予測を改善する動機付け。
転移能力を有する公開データ上で訓練された小型で汎用的な事前学習モデルを提案する。
マルチ解像度データとデータセット間転送に対応するためのアーキテクチャと訓練の改善を導入する。
複数のデータセットにわたって大規模LLMベースTS手法と比較した際の性能向上と計算効率を示す。

提案手法

軽量なTSMixerアーキテクチャに基づく多レベルTTMバックボーンを構築する。
公開TSデータセット上でTTMsを単変量的に事前訓練し、一般的な時間的ダイナミクスを学習する。
レベル間でマルチ解像度データを扱うために適応パッチングを適用する。
ダウンサンプリングによるデータセット拡張を用いて事前訓練のための複数の解像度を作成する。
パッチに解像度情報を埋め込むために解像度プレフィックス・チューニングを組み込む。
外部シグナルを活用する外生ミキサーとともにチャンネル混合を可能にするデコーダを用いてファインチューニングする。

実験結果

リサーチクエスチョン

RQ1公開TSデータのみで訓練された小型（≤1M parameters）の事前学習モデルは、未知のデータセットに対して競争力のあるゼロ/少数ショット予測を達成できるか？
RQ2適応パッチングとダウンサンプリングを用いたマルチ解像度事前訓練は、多様なTS解像度間で一般化を向上させるか？
RQ3ファインチューニング時にデコーダのチャンネル混合と外生融合は多変量予測性能を向上させるか？
RQ4標準ベンチマークにおけるTTMsの転移学習性能と計算量は、大規模LLMベースTS手法と比較してどうか？

主な発見

TTMsはfew/zero-shot予測で人気のベンチマークに対して12-38%の精度向上を達成する。
TTMsは大規模LLM-TS手法と比較して、学習可能パラメータで14X、総パラメータで106Xの計算量を削減する。
ファインチューニング時間は65X、推論時間は54X削減され、メモリ使用量は27X低減する。
ゼロショットTTMsは多くのベンチマークでしばしば少数ショットの結果を上回り、多様な公開TSデータからの効果的な転移学習を裏付ける。
TTM-CM（デコーダのチャンネル混合と外生融合）は、外生/多変量データセットで競合モデルを15-44%上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。