QUICK REVIEW

[论文解读] A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

Ghadeer Ghosheh, Jin Li|arXiv (Cornell University)|Mar 14, 2022

Generative Adversarial Networks and Image Synthesis参考文献 152被引用 20

一句话总结

对结构化电子健康记录（EHR）中应用GAN的全面综述，概述了应用、评估指标、数据源、隐私考虑以及到2022年1月的未来研究方向。

ABSTRACT

Electronic Health Records (EHRs) are a valuable asset to facilitate clinical research and point of care applications; however, many challenges such as data privacy concerns impede its optimal utilization. Deep generative models, particularly, Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions while achieving excellent performance and addressing these challenges. This work aims to review the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies. For this purpose, we combine perspectives from healthcare applications and machine learning techniques in terms of source datasets and the fidelity and privacy evaluation of the generated synthetic datasets. We also compile a list of the metrics and datasets used by the reviewed works, which can be utilized as benchmarks for future research in the field. We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices. We hope that this work motivates novel research development directions in the intersection of healthcare and machine learning.

研究动机与目标

Motivate and survey the use of GANs for Electronic Health Records (EHRs).
Categorize GAN-based EHR works by target application and data type (tabular vs. time-series).
Summarize evaluation metrics and data sources used to benchmark synthetic EHRs.
Discuss challenges in GAN training, data heterogeneity, and privacy, and propose best practices for future research.

提出的方法

Literature review of GAN-based EHR studies identified from Google Scholar up to January 2022.
Categorization of works by application: generation, semi-supervised learning/data augmentation, imputation, treatment effect estimation, and privacy preservation.
Compilation of evaluation metrics and datasets used across reviewed works to establish benchmarks.
Discussion of GAN architectures, loss functions, and training stability challenges relevant to EHR data.

实验结果

研究问题

RQ1What GAN architectures have been applied to generate and utilize EHR data (tabular and time-series)?
RQ2What evaluation metrics and datasets are commonly used to assess synthetic EHR quality and utility?
RQ3What are the main challenges (e.g., privacy, missingness, heterogeneity, training stability) and recommended practices for GANs on EHRs?

主要发现

GANs have been used to generate diverse EHR types (tabular and time-series), as well as for semi-supervised learning, imputation, treatment effect estimation, and privacy preservation.
A variety of architectures (medGAN, RGAN/RCGAN, EMR-WGAN, SC-GAN, SynTEG, EHR-M-GAN, CorGAN, MI-GAN, GAD, and others) address specific EHR challenges such as discrete/categorical data, irregular time-series, and heterogeneous features.
The review compiles commonly used evaluation components (Dimension-wise Similarity, Latent Distribution Similarity, Joint Distribution Similarity, Inter-dimensional Relationship Similarity, Privacy Preservation, Data Utility, Qualitative Evaluation) and data sources.
Datasets frequently used include MIMIC-III, Philips eICU, MAGGIC, MGH? VUMC Synthetic Derivative, NHIRD Taiwan, SEER, and private clinical datasets, illustrating the breadth of data types and access constraints.
Privacy-preserving GAN approaches (DPGAN, PATE-GAN, AC-GAN, PART-GANs, ADS-GAN, HealthGAN, HCGAN) are actively explored to mitigate patient re-identification risks.
Despite progress, training stability remains a bottleneck, with issues like mode collapse and vanishing gradients motivating methods such as WGAN, minibatch discrimination, unrolled GANs, and noise injection.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。