QUICK REVIEW

[论文解读] CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

Yusuke Tashiro, Jiaming Song|arXiv (Cornell University)|Jul 7, 2021

Machine Learning in Healthcare参考文献 43被引用 73

一句话总结

CSDI 训练条件分数驱动扩散模型以在多变量时间序列中插补缺失值，在概率性插补和确定性插补方面超越现有方法。

ABSTRACT

The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.

研究动机与目标

以概率建模为基础，推动并解决多变量时间序列中的缺失值插补问题。
开发一个利用观测数据进行插补的条件扩散模型框架。
设计一种自监督训练策略，以处理未知真实缺失值。
在真实数据集上展示相对于现有概率性与确定性插补基线的改进。

提出的方法

将去噪扩散概率模型扩展到用于插补的条件设置，建模 p(x_t-1^ta | x_t^ta, x_0^co)。
引入一个条件去噪函数 epsilon_theta，接收 x_t^ta、t 和 x_0^co 作为输入，并进行适当填充以及条件掩码 m^co。
通过受掩码语言模型启发的自监督方案来训练 epsilon_theta，在训练时选择插补目标 x_0^ta 和条件数据 x_0^co。
采用基于注意力的架构，具备二维（时间和特征）Transformer 组件，以捕捉时间序列中的依赖关系。
合并时间/传感器侧信息（时间嵌入、特征嵌入），并对类似 DiffWave 的 DDPM 参数化进行适应，以实现条件采样。
提供四种目标选择策略（Random、Historical、Mix、Test Pattern），以在训练过程中处理不同的缺失模式场景。

实验结果

研究问题

RQ1条件扩散模型是否能够明确学习在给定观测值的条件下的插补条件分布？
RQ2与无条件扩散模型相比，在观测数据条件下是否能提高概率性插补的性能？
RQ3相对最先进基线，CSDI 在概率性插补、非规则时间序列的插值以及概率性预测方面的表现如何？

主要发现

方法	Healthcare 10%	Healthcare 50%	Healthcare 90%	Air quality 10%	Air quality 50%	Air quality 90%
Multitask GP	0.489(0.005)	0.581(0.003)	0.942(0.010)	0.301(0.003)	0.301(0.003)	0.301(0.003)
GP-VAE	0.574(0.003)	0.774(0.004)	0.998(0.001)	0.397(0.009)	0.397(0.009)	0.397(0.009)
V-RIN	0.808(0.008)	0.831(0.005)	0.922(0.003)	0.526(0.025)	0.526(0.025)	0.526(0.025)
unconditional	0.360(0.007)	0.458(0.008)	0.671(0.007)	0.135(0.001)	0.135(0.001)	0.135(0.001)
CSDI (proposed)	0.238(0.001)	0.330(0.002)	0.522(0.002)	0.108(0.001)	0.108(0.001)	0.108(0.001)

CSDI 在医疗保健和空气质量数据集上，相较强基线的 CRPS 提升了 40-65%。
使用 CSDI 的确定性插补相比领先的确定性方法，MAE 降低了 5-20%。
CSDI 的条件建模优于无条件扩散模型，展示了对观测值进行条件建模的好处。
CSDI 可用于时间序列插值和概率性预测，并且在这些任务上与专门的基线方法具有竞争力。
在多次实验中，CSDI 改善了概率性插补并提供了现实的不确定性表示（见 CRPS 与样本分布）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。