[论文解读] Uncovering the structure of clinical EEG signals with self-supervised learning
本文研究用于脑电(EEG)的自监督学习(SSL),提出三种预文本任务(相对定位、时序混洗与对比预测编码)来学习 EEG 表征,并在睡眠分期和病理检测任务中进行评估,结果显示在低标签场景下 SSL 特征优于有监督基线,并揭示生理上具有意义的结构。
Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of self-supervised learning approaches on EEG data. Our results suggest that SSL may pave the way to a wider use of deep learning models on EEG data.
研究动机与目标
- 将 SSL 作为解决临床环境中标注 EEG 数据稀缺的问题的方案。
- 引入三个针对 EEG 的自监督预文本任务,以从未标注数据中学习鲁棒表征。
- 在睡眠分期和病理检测任务中对比评估 SSL 表征,与有监督和手工设计基线进行基准比较。
- 分析学习到的嵌入在生理和临床意义上的结构。
提出的方法
- 为 EEG 定义三种自监督预文本任务:相对定位(RP)、时序混洗(TS)和对比预测编码(CPC)。
- 使用端到端可训练的编码器 h_Θ 将 EEG 窗口映射到特征空间,配备对比模块 g_RP、g_TS,或参数化 CPC 组件。
- 使用二元逻辑损失优化 RP/TS,使用 InfoNCE 损失优化 CPC,端到端训练并接一个下游线性分类器或逻辑回归。
- 评估两种 EEG 架构(StagerNet 和 ShallowNet)作为嵌入器,并为 CPC 配置基于 GRU 的自回归组件。
- 将 SSL 与基线比较:随机权重、卷积自编码器、纯监督模型以及手工特征。
- 在睡眠分期(Physionet Challenge 2018)和病理检测(TUH Abnormal EEG)上进行实验。
实验结果
研究问题
- RQ1哪些 SSL 任务能最好地捕捉 EEG 数据中的相关结构?
- RQ2在下游 EEG 分类任务中,SSL 特征与无监督和有监督基线相比如何?
- RQ3SSL 学到的嵌入揭示了关于生理和临床现象的哪些特征(如年龄效应)?
主要发现
- SSL 特征使线性分类器在低标注数据条件下胜过纯监督网络。
- 在所有标签可用时,SSL 表征达到与完全标签监督相当的性能。
- 来自 SSL 方法的嵌入揭示与生理和临床因素(如年龄)相关的潜在结构。
- 两项临床相关的 EEG 任务——睡眠分期和病理检测——从 SSL 相较基线方法中获益。
- 该研究提供了证据表明 SSL 能提升深度学习在临床 EEG 数据中的适用性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。