[論文レビュー] Enhanced Atrial Fibrillation Prediction in ESUS Patients with Hypergraph-based Pre-training
The paper introduces supervised and unsupervised hypergraph-based pre-training to improve atrial fibrillation prediction in ESUS patients by transferring knowledge from a larger stroke cohort to a smaller ESUS dataset, improving predictive performance.
Atrial fibrillation (AF) is a major complication following embolic stroke of undetermined source (ESUS), elevating the risk of recurrent stroke and mortality. Early identification is clinically important, yet existing tools face limitations in accuracy, scalability, and cost. Machine learning (ML) offers promise but is hindered by small ESUS cohorts and high-dimensional medical features. To address these challenges, we introduce supervised and unsupervised hypergraph-based pre-training strategies to improve AF prediction in ESUS patients. We first pre-train hypergraph-based patient embedding models on a large stroke cohort (7,780 patients) to capture salient features and higher-order interactions. The resulting embeddings are transferred to a smaller ESUS cohort (510 patients), reducing feature dimensionality while preserving clinically meaningful information, enabling effective prediction with lightweight models. Experiments show that both pre-training approaches outperform traditional models trained on raw data, improving accuracy and robustness. This framework offers a scalable and efficient solution for AF risk prediction after stroke.
研究の動機と目的
- Address data scarcity and high dimensionality in ESUS-related AF prediction.
- Leverage hypergraph representations to capture higher-order ICD code interactions.
- Develop supervised and unsupervised pre-training approaches and transfer to ESUS-AF.
- Evaluate pre-training methods against models trained from scratch.
- Demonstrate robustness and generalizability across external datasets.
提案手法
- Represent patient data as a hypergraph where hyperedges are visits and nodes are diagnostic features.
- Pre-train a hypergraph transformer on AI-RESPECT (7,780 samples) for supervised learning and on AI-RESPECT with self-supervised objectives for unsupervised learning.
- Transfer the pre-trained embeddings to ESUS-AF (510 patients) to form compact 32-D representations.
- Concatenate 32-D embeddings with 53 baseline features to form final patient representations.
- Train downstream AF predictors (LR, RF, GB) on the ESUS-AF data using 5-fold nested cross-validation.
- Compare against from-scratch representations and report AUROC, accuracy, F1-score, and PR-AUC.
実験結果
リサーチクエスチョン
- RQ1Can hypergraph-based pre-training improve AF prediction in ESUS patients with limited samples?
- RQ2Do supervised and unsupervised hypergraph pre-training strategies improve performance over training from scratch?
- RQ3How well do transferred embeddings generalize to external datasets (e.g., MIMIC-IV)?
- RQ4What is the impact of pre-training on model robustness and data-efficiency?
主な発見
| Embedding Method | ML Model | AUROC | Accuracy | F1-score | PR-AUC |
|---|---|---|---|---|---|
| From Scratch | LR | 0.489 ±0.025 | 0.690 ±0.029 | 0.199 ±0.077 | 0.254 ±0.043 |
| From Scratch | RF | 0.494 ±0.081 | 0.441 ±0.214 | 0.210 ±0.152 | 0.232 ±0.027 |
| From Scratch | GB | 0.512 ±0.033 | 0.729 ±0.014 | 0.169 ±0.071 | 0.246 ±0.020 |
| Supervised | LR | 0.617 ±0.033 | 0.721 ±0.034 | 0.407 ±0.037 | 0.319 ±0.046 |
| Supervised | RF | 0.625 ±0.045 | 0.784 ±0.022 | 0.384 ±0.111 | 0.374 ±0.042 |
| Supervised | GB | 0.583 ±0.036 | 0.759 ±0.048 | 0.248 ±0.068 | 0.284 ±0.010 |
| Unsupervised | LR | 0.616 ±0.025 | 0.625 ±0.023 | 0.356 ±0.036 | 0.312 ±0.038 |
| Unsupervised | RF | 0.620 ±0.041 | 0.747 ±0.042 | 0.331 ±0.069 | 0.321 ±0.036 |
| Unsupervised | GB | 0.582 ±0.020 | 0.741 ±0.049 | 0.319 ±0.062 | 0.292 ±0.036 |
- Both supervised and unsupervised pre-training improve AF prediction over training from scratch across all models.
- Supervised pre-training yields AUROC gains of approximately 7–12% and F1-score gains of about 7–20% on ESUS-AF.
- Unsupervised pre-training also yields substantial improvements and is more versatile when labels are limited or unavailable.
- Pre-training demonstrates data-efficiency, with robust performance under varying training data sizes and better generalization in external validation.
- Transferred representations maintain performance across different downstream models (LR, RF, GB) and external datasets, with some exceptions for GB in MIMIC-IV.
- The approach highlights scalability and potential for multi-institutional clinical AI applications.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。