QUICK REVIEW

[論文レビュー] Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Yu-Chen Den, Kuan‐Yu Chen|arXiv (Cornell University)|Mar 17, 2026

Stock Market Forecasting Methods被引用数 0

ひとこと要約

TIPS は、複数の inductive biases を統合する蒸留フレームワークで、単一の Transformer に統合して金融時系列予測を頑健にする。推論コストを抑えつつ regime shift 下で最先端の結果を達成。

ABSTRACT

Transformer-based models have been widely adopted for time-series forecasting due to their high representational capacity and architectural flexibility. However, many Transformer variants implicitly assume stationarity and stable temporal dynamics -- assumptions routinely violated in financial markets characterized by regime shifts and non-stationarity. Empirically, state-of-the-art time-series Transformers often underperform even vanilla Transformers on financial tasks, while simpler architectures with distinct inductive biases, such as CNNs and RNNs, can achieve stronger performance with substantially lower complexity. At the same time, no single inductive bias dominates across markets or regimes, suggesting that robust financial forecasting requires integrating complementary temporal priors. We propose TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework that synthesizes diverse inductive biases -- causality, locality, and periodicity -- within a unified Transformer. TIPS trains bias-specialized Transformer teachers via attention masking, then distills their knowledge into a single student model with regime-dependent alignment across inductive biases. Across four major equity markets, TIPS achieves state-of-the-art performance, outperforming strong ensemble baselines by 55%, 9%, and 16% in annual return, Sharpe ratio, and Calmar ratio, while requiring only 38% of the inference-time computation. Further analyses show that TIPS generates statistically significant excess returns beyond both vanilla Transformers and its teacher ensembles, and exhibits regime-dependent behavioral alignment with classical architectures during their profitable periods. These results highlight the importance of regime-dependent inductive bias utilization for robust generalization in non-stationary financial time series.

研究の動機と目的

regime shifts および non-stationarity のために金融時系列予測に適応的誘導バイアスの必要性を動機づける。
naive なマルチバイアスの統合が、 bias 専門モデルやアンサンブルと比較して性能を劣化させることを示す。
diverse なバイアスを1つのトランスフォーマへ統合する蒸留に基づくフレームワークである TIPS を提案・検証する。
TIPS が主要な株式市場全体で最先端の性能を達成しつつ推論コストを低減することを示す。

提案手法

distinct priors（因果性、局所性、周期性）を符号化する bias 専門の Transformer 教師を attention マスキングと入力設計で訓練する。
bias 専門の7人の教師（6つの bias 特化 + vanilla Transformer）から Bias Teacher Ensemble を構築し、多様な priors を捉える。
ensemble の予測を単一の student Transformer に蒸留し、堅牢性を保つため厳密な正則化を用いて硬直な模倣を回避する。
温度スケーリングを用いたソフトエンセmblesターゲットを構築し、ラベル平滑化を適用して較正を改善する。
priors を合成するため制約のない注意を持つ student を訓練し、頑健性のために Stochastic Weight Averaging を用いる。
バイアスの活性化が regime に依存する分析と、統計的に過剰リターンが生じることを示す。

Figure 1 . Performance–efficiency trade-off across generic time-series models, financial forecasting models, and classical architectures evaluated across multiple equity markets. The figure highlights substantial variation in performance and computational cost across model families, with TIPS achiev

実験結果

リサーチクエスチョン

RQ1 多様な誘導バイアスは非定常な金融データに対する Transformer の頑健性を向上させるか？
RQ2 naive な複数バイアスの統合は専門化やエンsembling と比較して性能を劣化させるか？
RQ3 蒸留された student は複数の priors を効果的に合成しつつ推論効率を維持できるか？
RQ4 bias priors は regime 特異的に活性化され、利益を生む市場条件と一致するか？
RQ5 TIPS はベースラインモデルを超える統計的に有意な超過リターンをどの程度提供するか？

主な発見

TIPS は4つの主要な株式市場全体で最も強い総合性能を示し、ベースラインより平均シャープ比と年率リターンが優れる。
Bias Teacher Ensemble（注意マスキングを介して）は、古典的アーキテクチャと汎用 SOTA モデルのアンサンブルを上回り、アーキテクチャの異質性を持たずに bias エンコーディングの有効性を示す。
単一の student への蒸留は bias ensemble を上回る顕著な利益をもたらし、推論時間を約7分の1に短縮して単一モデルでアンサンブルレベルの頑健性を実現する。
アブレーションにより正則化成分（低温蒸留、ラベル平滑化、SWA）が効果的な bias 合成に共同で必要であることが示される。
分析は TIPS が vanilla Transformer を上回る統計的に有意な alpha を達成することを示し、誘導バイアス合成から有益な信号抽出が可能であることを示唆する。

Figure 2 . Overview of the TIPS training framework. (a) Bias-specialized Transformer (TFM) teachers are constructed via different attention masks or positional biases (Colors indicate where the masks and biases are applied). (b) Teachers are trained independently for ranking prediction. (c) Teacher

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。