QUICK REVIEW

[论文解读] Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs

Cristóbal Esteban, Stephanie L. Hyland|arXiv (Cornell University)|Jun 8, 2017

Time Series Analysis and Forecasting被引用 380

一句话总结

本文提出循环生成对抗网络（RGAN）和循环条件生成对抗网络（RCGAN），用于生成实值多维时间序列，包括医疗 ICU 数据，并提出了新的评估方法（MMD、TSTR）以及差分隐私实验。

ABSTRACT

Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the generator and the discriminator. In the case of RCGANs, both of these RNNs are conditioned on auxiliary information. We demonstrate our models in a set of toy datasets, where we show visually and quantitatively (using sample likelihood and maximum mean discrepancy) that they can successfully generate realistic time-series. We also describe novel evaluation methods for GANs, where we generate a synthetic labelled training dataset, and evaluate on a real test set the performance of a model trained on the synthetic data, and vice-versa. We illustrate with these metrics that RCGANs can generate time-series data useful for supervised training, with only minor degradation in performance on real test data. This is demonstrated on digit classification from 'serialised' MNIST and by training an early warning system on a medical dataset of 17,000 patients from an intensive care unit. We further discuss and analyse the privacy concerns that may arise when using RCGANs to generate realistic synthetic medical time series data.

研究动机与目标

展示一种使用对抗训练生成实值序列的方法。
提出用于时间序列数据的GAN新评估指标。
展示适用于有监督任务的合成医疗时间序列数据的生成。
分析隐私影响并探索在医疗数据上对 GAN 采用差分隐私训练。

提出的方法

使用基于 LSTM 的生成器和判别器来构成用于实值序列的 RGANS 和 RCANs。
在条件设置中将 RGAN/RCGAN 条件化以利用辅助信息。
使用标准 GAN 目标进行训练；讨论 Wasserstein 目标在 RGANs 上的局限性。
使用带有 RBF 核的 MMD 来比较生成序列与真实序列；通过 t 统计量选择核带宽。
引入 TSTR（用合成数据训练、在真实数据上测试）和 TRTS（在真实数据上训练、在合成数据上测试）作为实际评估协议。
通过对判别器应用 DP-SGD 并进行隐私记账来探索差分隐私。

实验结果

研究问题

RQ1循环 GAN 是否能生成现实且多维的实值时间序列？
RQ2条件输入是否能够实现对时间序列数据的受控生成？
RQ3是否存在可靠的、与任务相关的 GAN 生成时间序列评估（如 TSTR、MMD2）？
RQ4在有监督任务中，使用合成数据训练的模型是否能接近在真实数据上训练的模型的性能？
RQ5在医疗数据上训练 RGAN 的隐私影响有哪些，差分隐私能否提供实际的保证？

主要发现

RGAN 与 RCAGN 能在合成正弦波、平滑函数以及将 MNIST 视为时间序列的任务等场景中生成现实的时间序列。
MMD 2 与数据质量相关，能够区分生成分布与真实分布；组合核可提高灵敏度。
TSTR/TRTS 评估表明合成数据可支持有监督学习，性能接近真实数据基线（如 MNIST 与 eICU 实验）。
在 ICU 数据实验中，RCGAN 合成的数据在 TSTR 评估下对多项健康指标的分类表现具有竞争力。
在 DP-SGD 中对判别器执行差分隐私训练，任务准确度明显低于非私有训练，突显了医疗数据合成中的隐私权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。