QUICK REVIEW

[论文解读] Robust and Generalizable Atrial Fibrillation Detection from ECG Using Time-Frequency Fusion and Supervised Contrastive Learning

Hongtao Li, Wei Jia|arXiv (Cornell University)|Jan 15, 2026

ECG Monitoring and Analysis被引用 0

一句话总结

本文提出 MGCNet，一种带有双向门控模块和跨模态监督对比学习的多模态架构，用于融合时域和频域ECG特征，实现对 AF 检测的数据集内鲁棒性和跨数据集泛化的最新水平。

ABSTRACT

Atrial fibrillation (AF) is a common cardiac arrhythmia that significantly increases the risk of stroke and heart failure, necessitating reliable and generalizable detection methods from electrocardiogram (ECG) recordings. Although deep learning has advanced automated AF diagnosis, existing approaches often struggle to exploit complementary time-frequency information effectively, limiting both robustness under intra-dataset and generalization across diverse clinical datasets. To address these challenges, we propose a cross-modal deep learning framework comprising two key components: a Bidirectional Gating Module (BGM) and a Cross-modal Supervised Contrastive Learning (CSCL) strategy. The BGM facilitates dynamic, reciprocal refinement between time and frequency domain features, enhancing model robustness to signal variations within a dataset. Meanwhile, CSCL explicitly structures the joint embedding space by pulling together label-consistent samples and pushing apart different ones, thereby improving inter-class separability and enabling strong cross-dataset generalization. We evaluate our method through five-fold cross-validation on the AFDB and the CPSC2021 dataset, as well as bidirectional cross-dataset experiments (training on one and testing on the other). Results show consistent improvements over state-of-the-art methods across multiple metrics, demonstrating that our approach achieves both high intra-dataset robustness and excellent cross-dataset generalization. We further demonstrate that our method achieves high computational efficiency and anti-interference capability, making it suitable for edge deployment.

研究动机与目标

通过利用互补的时域与频域信息来推动 ambulatory ECG 的鲁棒 AF 检测。
开发一个跨模态网络，在多个编码阶段动态融合时频特征以提升对噪声与形态变异的鲁棒性。
通过对比学习在嵌入空间中跨模态和类别结构化，提升在不同数据集上的泛化能力。
展示适合边缘部署的高效性，并在跨数据集场景中评估以模拟真实世界的领域转移。）
method（方法要点）译文如下（保持原文顺序与术语一致）：
- 从原始ECG（时域）和基于STFT的频谱图（频域）进行双分支特征提取。
- 双向门控模块（BGM）实现跨模态在多个编码阶段对时间与频谱特征的动态互REFinement。
- 模态特异性全局聚合：时间嵌入采用双向GRU，频率嵌入采用全局池化，生成 Z_t 和 Z_f。
- 跨模态监督对比学习（CSCL）：在每个模态内进行对比损失，以及跨模态对齐的对比学习，利用类别标签指引。
- 通过拼接将 Z_t 和 Z_f 融合后送入分类器；总损失为 L_total = L_cls + lambda * L_cont。

提出的方法

从原始ECG（时域）和基于 STFT 的频谱图（频域）进行双分支特征提取。
双向门控模块（BGM）实现跨模态在多个编码阶段对时间与频谱特征的动态互REFinement。
模态特异性全局聚合：时间嵌入采用双向 GRU，频率嵌入采用全局池化，生成 Z_t 和 Z_f。
跨模态监督对比学习（CSCL）：在每个模态内进行对比损失，以及跨模态对齐的对比学习，利用类别标签指引。
通过拼接将 Z_t 和 Z_f 融合后送入分类器；总损失为 L_total = L_cls + lambda * L_cont。

Figure 1: Spectrogram of the cleaned ECG segment generated via STFT, encoded as a three-channel heatmap for frequency-domain modeling. In the time domain, AFIB is characterized by the absence of P waves and highly irregular R-R intervals. In the frequency domain (STFT spectrogram), it exhibits a dif

实验结果

研究问题

RQ1一个跨模态门控融合的时域与频域ECG表示是否能在数据集内提升 AF 检测的鲁棒性？
RQ2显式的跨模态监督对比学习是否能提升跨类别可分性与跨数据集泛化能力？
RQ3在 AFDB 和 CPSC2021 数据集之间的领域转移下，模态内对比与模态间对比对性能有何贡献？

主要发现

Model	Setting	Acc	AUC	F1	Precision	Recall	Specificity
SCCNN (2023)	AFDB→CPSC2021	0.8386	0.9497	0.8021	0.6969	0.9448	0.7823
IMCResNet (2024)	AFDB→CPSC2021	0.8504	0.9319	0.7982	0.7489	0.8545	0.8482
MoETransformer (2024)	AFDB→CPSC2021	0.8681	0.9355	0.8165	0.7878	0.8474	0.8971
SeqAFNet (2024)	AFDB→CPSC2021	0.8646	0.9382	0.8165	0.7691	0.8701	0.8617
MFEGNet (2025)	AFDB→CPSC2021	0.8843	0.9593	0.8515	0.7663	0.9579	0.8453
MSCGN (2026)	AFDB→CPSC2021	0.8951	0.9609	0.8611	0.7947	0.9397	0.8714
MGCNet (Ours)	AFDB→CPSC2021	0.9165	0.9643	0.8819	0.8639	0.9007	0.9248
SCCNN (2023)	CPSC2021→AFDB	0.8413	0.9140	0.7907	0.7828	0.7987	0.8669
IMCResNet (2024)	CPSC2021→AFDB	0.7706	0.8240	0.7274	0.6564	0.8157	0.7435
MoETransformer (2024)	CPSC2021→AFDB	0.7490	0.8535	0.6840	0.6483	0.7238	0.7642
SeqAFNet (2024)	CPSC2021→AFDB	0.8297	0.9272	0.7870	0.6940	0.9088	0.7878
MFEGNet (2025)	CPSC2021→AFDB	0.8756	0.9620	0.8450	0.7935	0.9037	0.8588
MSCGN (2026)	CPSC2021→AFDB	0.9164	0.9504	0.8947	0.8483	0.9465	0.8983
MGCNet (Ours)	CPSC2021→AFDB	0.9507	0.9894	0.9331	0.9514	0.9154	0.9719

MGCNet在数据集内测试中在 AFDB 和 CPSC2021 上均达到最高准确率与 AUC（AFDB：Acc 0.9878, AUC 0.9987；CPSC2021：Acc 0.9801, AUC 0.9979）。
跨数据集评估表明 MGCNet 在两个传输方向（AFDB→CPSC2021 与 CPSC2021→AFDB）中均优于所有对比方法。
跨数据集结果：AFDB→CPSC2021 Acc 0.9165, AUC 0.9643；CPSC2021→AFDB Acc 0.9507, AUC 0.9894，特异度高（0.9719）。
消融研究显示移除 BGM 或 CSCL 将降低数据集内与跨数据集性能，且多模态变体在领域转移下明显优于单分支模型。
五折患者层交叉验证显示折间方差较低（标准差约 0.02），体现鲁棒性与可重复性。

Figure 2: (a) The full multimodal network for AF detection; (b) The BGM enabling dynamic interaction between time- and frequency-domain features; (c) The CSCL that enforces discriminative embedding alignment across modalities.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。