QUICK REVIEW

[論文レビュー] Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs

Cristóbal Esteban, Stephanie L. Hyland|arXiv (Cornell University)|Jun 8, 2017

Time Series Analysis and Forecasting被引用数 380

ひとこと要約

本論文は実数値の多次元時系列を生成するための Recurrent GANs (RGAN) と Recurrent Conditional GANs (RCGAN) を提案し、医療ICUデータを含むデータを扱い、 novel evaluation methods (MMD, TSTR) および differential privacy 実験を提案する。

ABSTRACT

Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the generator and the discriminator. In the case of RCGANs, both of these RNNs are conditioned on auxiliary information. We demonstrate our models in a set of toy datasets, where we show visually and quantitatively (using sample likelihood and maximum mean discrepancy) that they can successfully generate realistic time-series. We also describe novel evaluation methods for GANs, where we generate a synthetic labelled training dataset, and evaluate on a real test set the performance of a model trained on the synthetic data, and vice-versa. We illustrate with these metrics that RCGANs can generate time-series data useful for supervised training, with only minor degradation in performance on real test data. This is demonstrated on digit classification from 'serialised' MNIST and by training an early warning system on a medical dataset of 17,000 patients from an intensive care unit. We further discuss and analyse the privacy concerns that may arise when using RCGANs to generate realistic synthetic medical time series data.

研究の動機と目的

対向的学習を用いて実数値シーケンスを生成する方法を実証する。
時系列データに対する GAN の新規評価指標を提案する。
監視タスクに適した合成医療時系列データの生成を示す。
医療データにおける GAN のプライバシー影響を分析し、差分プライバシー学習を探索する。

提案手法

LSTMベースの生成器と識別器を用いて実-valued sequences の RGANs および RCANs を構成する。
条件付き設定で RGAN/RCGAN を補助情報で条件付けする。
標準的な GAN 目的を用いて学習する；RGAN に対する Wasserstein 目的の制約について議論する。
生成系列と実データを比較するために RBF カーネルを用いた MMD で評価する；カーネル帯域幅は t-statistics によって選択する。
実践的な評価プロトコルとして TSTR (Train on Synthetic, Test on Real) および TRTS (Train on Real, Test on Synthetic) を導入する。
プライバシー会計を伴う DP-SGD を識別器に適用して差分プライバシーを通じた探索を行う。

実験結果

リサーチクエスチョン

RQ1再帰的 GAN は現実的な実数値の多次元時系列を生成できるか？
RQ2条件付き入力は時系列データの制御生成を可能にするか？
RQ3GAN が生成した時系列データに対して信頼できるタスク関連の評価（例：TSTR, MMD 2）を提供するか？
RQ4合成データで訓練したモデルは実データで訓練したモデルに近い性能を達成できるか（監督タスクで）？
RQ5医療データで RGAN を訓練する際のプライバシー影響は何か、差分プライバシーは実践的な保証を提供できるか？

主な発見

RGAN および RCAGN は、合成サイン波、滑らかな関数、MNIST-as-time-series のタスクを跨いで現実的な時系列を生成できる。
MMD 2 はデータ品質と相関し、生成分布と実データ分布を区別できる；カーネルを組み合わせると感度が向上する。
TSTR/TRTS 評価は、実データベース近似の高性能を示し、合成データが監視学習をサポートできることを示す（例：MNIST と eICU の実験）。
ICU データ実験では、TSTR 評価の下で RCGAN によって合成されたデータが複数の健康指標の分類で競合力のある性能を支えた。
DP-SGD による識別器の差分プライバシー学習は、非プライバシー学習と比較してタスク精度が顕著に低下し、医療データ合成におけるプライバシーと性能のトレードオフを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。