QUICK REVIEW

[論文レビュー] CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

Yusuke Tashiro, Jiaming Song|arXiv (Cornell University)|Jul 7, 2021

Machine Learning in Healthcare参考文献 43被引用数 73

ひとこと要約

CSDIは条件付きスコアベース拡散モデルを用いて多変量時系列の欠損値を補完し、確率的補完と決定論的補完を最先端手法より改善する。

ABSTRACT

The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.

研究の動機と目的

多変量時系列における欠損値補完を確率的モデリングで動機づけ、対処する。
観測データを欠損補完に活用する条件付き拡散モデルの枠組みを開発する。
未知の真の欠損値に対処する自己教師付き訓練戦略を設計する。
実データセット上で既存の確率的および決定論的補完のベースラインより改善を実証する。

提案手法

欠損補完のためにノイズ除去式拡散確率モデルを条件付き設定に拡張し、p(x_t-1^ta | x_t^ta, x_0^co)をモデリングする。
適切なパディングと条件マスク m^co を伴い、x_t^ta、t、x_0^co を入力として受け取る条件付きデノイジング関数 epsilon_theta を導入する。
訓練中に欠損補完ターゲット x_0^ta と条件データ x_0^co を選択する、マスク付き言語モデルに触発された自己教師付きスキームによって epsilon_theta を訓練する。
時系列の依存性を捉えるため、2D（時間軸と特徴軸）Transformer コンポーネントを用いたアテンションベースのアーキテクチャを使用する。
時刻エンベディングや特徴エンベディングなどの時刻/センサー側情報を取り入れ、条件付きサンプリングのために DiffWave に似た DDPM パラメータ化を適用する。
訓練時の欠損パターンの変化に対応する4つのターゲット選択戦略（Random、Historical、Mix、Test Pattern）を提供する。

実験結果

リサーチクエスチョン

RQ1観測値が与えられたときの欠損補完の条件付き分布を条件付き拡散モデルが明示的に学習できるか。
RQ2観測データで条件付けすることは、無条件拡散モデルと比較して確率的補完の性能を改善するか。
RQ3CSDIは確率的補完、不規則時系列の補間、確率的予測において、最先端のベースラインと比較してどう機能するか。

主な発見

方法	Healthcare 10%	Healthcare 50%	Healthcare 90%	Air quality 10%	Air quality 50%	Air quality 90%
Multitask GP	0.489(0.005)	0.581(0.003)	0.942(0.010)	0.301(0.003)	0.301(0.003)	0.301(0.003)
GP-VAE	0.574(0.003)	0.774(0.004)	0.998(0.001)	0.397(0.009)	0.397(0.009)	0.397(0.009)
V-RIN	0.808(0.008)	0.831(0.005)	0.922(0.003)	0.526(0.025)	0.526(0.025)	0.526(0.025)
unconditional	0.360(0.007)	0.458(0.008)	0.671(0.007)	0.135(0.001)	0.135(0.001)	0.135(0.001)
CSDI (proposed)	0.238(0.001)	0.330(0.002)	0.522(0.002)	0.108(0.001)	0.108(0.001)	0.108(0.001)

CSDIは healthcare と air quality データセットで強力な確率的ベースラインに対し CRPS を40–65%改善した。
CSDIを用いた決定論的補完は、主要な決定論的方法と比較して MAE を5–20%削減した。
CSDIの条件付きモデリングは無条件拡散モデルを上回り、観測値を条件に取る利点を示している。
CSDIは時系列の補間と確率的予測に適用可能で、これらのタスクに対して専門的なベースラインと競合する。
複数の実験を通じて、CSDIは確率的補完を改善し、現実的な不確実性表現を提供する（CRPSおよびサンプル分布に示される）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。