QUICK REVIEW

[論文レビュー] Time Series as Images: Vision Transformer for Irregularly Sampled Time Series

Zekun Li, Shiyang Li|arXiv (Cornell University)|Mar 1, 2023

Time Series Analysis and Forecasting被引用数 14

ひとこと要約

ViTST は不規則にサンプリングされた多変量時系列を折れ線グラフ画像に変換し、分類のために事前学習済みのビジョン・トランスフォーマーをファインチューニングします。最先端の結果と欠損データへのロバスト性を達成します。

ABSTRACT

Irregularly sampled time series are increasingly prevalent, particularly in medical domains. While various specialized methods have been developed to handle these irregularities, effectively modeling their complex dynamics and pronounced sparsity remains a challenge. This paper introduces a novel perspective by converting irregularly sampled time series into line graph images, then utilizing powerful pre-trained vision transformers for time series classification in the same way as image classification. This method not only largely simplifies specialized algorithm designs but also presents the potential to serve as a universal framework for time series modeling. Remarkably, despite its simplicity, our approach outperforms state-of-the-art specialized algorithms on several popular healthcare and human activity datasets. Especially in the rigorous leave-sensors-out setting where a portion of variables is omitted during testing, our method exhibits strong robustness against varying degrees of missing observations, achieving an impressive improvement of 42.8% in absolute F1 score points over leading specialized baselines even with half the variables masked. Code and data are available at https://github.com/Leezekun/ViTST

研究の動機と目的

不規則にサンプリングされた時系列をデータを画像として視覚化することで、シンプルで普遍的なアプローチを動機づける。
折れ線グラフ画像における時刻的ダイナミクスと変数間の関係をモデル化するために、事前学習済みのビジョントランスフォーマーを活用する。
医療と人間の行動データセットで最先端の性能とロバスト性を示す。
不規則な時系列と規則な時系列の両方への適用性を示し、一般的なフレームワークとしての汎用性を強調する。

提案手法

多変量の不規則時系列を、各変数の折れ線グラフをグリッド配置で描画して1つのRGB画像に変換する。
変数ごとに一貫したスケールと色分けを用いて折れ線グラフを作成する。
得られた画像を用いて事前学習済みのビジョントランスフォーマー（Swin Transformer）を分類のためにファインチューニングする。
静的な人口統計情報・テキスト特徴をRoBERTaエンコーダでエンコードして画像埋め込みと結合することで、任意で組み込む。
センサーをLeave-outする設定で評価し、欠落変数へのロバスト性をテストする。

Figure 1: An illustration of our approach ViTST. The example is from a healthcare dataset P12 [ 12 ] , which provides the irregularly sampled observations of 36 variables for patients (we only show 4 variables here for simplicity). Each column in the table is an observation of a variable, with the o

実験結果

リサーチクエスチョン

RQ1自然画像で訓練されたビジョントランスフォーマーは、入力を折れ線グラフ画像として可視化した場合、不規則にサンプルされた時系列を効果的に分類できるか？
RQ2ViTST フレームワークは欠測観測に対してロバスト性を提供し、医療およびアクティビティデータセットで性能を維持できるか？
RQ3グリッド配置、画像解像度、描画の詳細が性能にどのように影響するか？
RQ4静的特徴が含まれる場合や規則な時系列データに適用した場合、方法は競争力を持つか？

主な発見

方法	P19 AUROC	P19 AUPRC	P12 AUROC	P12 AUPRC	PAM 精度	PAM 適合率	PAM 再現率	PAM F1
ViTST	89.2±2.0	53.1±3.4	85.1±0.8	51.1±4.1	95.8±1.3	96.2±1.3	96.1±1.1	96.5±1.2
Raindrop	87.0±2.3	51.8±5.5	82.8±1.7	44.0±3.0	88.5±1.5	89.9±1.5	89.9±0.6	89.8±1.0
Transformer	80.7±3.8	42.7±7.7	83.3±0.7	47.9±3.6	83.5±1.5	84.8±1.5	86.0±1.2	85.0±1.3
Trans-mean	83.7±1.8	45.8±3.2	82.6±2.0	46.3±4.0	83.7±2.3	84.9±2.6	86.4±2.1	85.1±2.4
GRU-D	83.9±1.7	46.9±2.1	81.9±2.1	46.1±4.7	83.3±1.6	84.6±1.2	85.2±1.6	84.8±1.2
SeFT	81.2±2.3	41.9±3.1	73.9±2.5	31.1±4.1	67.1±2.2	70.0±2.4	68.2±1.5	68.5±1.8
mTAND	84.4±1.3	50.6±2.0	84.2±0.8	48.2±3.4	74.6±4.3	74.3±4.0	79.5±2.8	76.8±3.4
IP-Net	84.6±1.3	38.1±3.7	82.6±1.4	47.6±3.1	74.3±3.8	75.6±2.1	77.9±2.2	76.6±2.8
DGM 2 -O	86.7±3.4	44.7±11.7	84.4±1.6	47.3±3.6	82.4±2.3	85.2±1.2	83.9±2.3	84.3±1.8
MTGNN	81.9±6.2	39.9±8.9	74.4±6.7	35.5±6.0	83.4±1.9	85.2±1.7	86.1±1.9	85.9±2.4
ViT	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown	Unknown

ViTST は P19 で AUROC が 89.2%、AUPRC が 53.1%、P12 で AUROC が 85.1%、AUPRC が 51.1% を達成し、不規則時系列の最先端手法を上回った。
PAM データセットでは、ViTST は精度 95.8%、適合率 96.2%、再現率 96.1%、F1 スコア 96.5% を達成。
ViTST はデータセット全体で AUROC/精度指標において従来の最先端ベースラインを 2.2–7.3 ポイント上回る。
Leave-sensors-out 条件下で、ViTST は高い性能を維持し、F1 スコアでベースラインを最大で 42.8% 上回る。
事前学習済みのビジョントランスフォーマー（ViT/Swin）は、スクラッチからの学習よりも顕著な利得を提供し、自然画像から折れ線グラフ時系列画像への有効な転移を示す。
アブレーションは、描画のバリエーションに対するロバスト性を示し、カラーコードと変数ごとの折れ線グラフが性能にとって重要である。

Figure 2: Illustration of the shifted window approach in Swin Transformer. Self-attention is calculated within each window (grey box). When the window is contained within a single line graph, it captures local interactions. After shifting, the window includes patches from different line graphs, allo

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。