[论文解读] HCFT: Hierarchical Convolutional Fusion Transformer for EEG Decoding
HCFT 引入了一个轻量级的双分支卷积编码器,结合跨注意力和层次化 Transformer 融合用于 EEG 解码,在 MI 分类(BCI IV-2b)和癫痫预测(CHB-MIT)中达到最先进的结果。
Electroencephalography (EEG) decoding requires models that can effectively extract and integrate complex temporal, spectral, and spatial features from multichannel signals. To address this challenge, we propose a lightweight and generalizable decoding framework named Hierarchical Convolutional Fusion Transformer (HCFT), which combines dual-branch convolutional encoders and hierarchical Transformer blocks for multi-scale EEG representation learning. Specifically, the model first captures local temporal and spatiotemporal dynamics through time-domain and time-space convolutional branches, and then aligns these features via a cross-attention mechanism that enables interaction between branches at each stage. Subsequently, a hierarchical Transformer fusion structure is employed to encode global dependencies across all feature stages, while a customized Dynamic Tanh normalization module is introduced to replace traditional Layer Normalization in order to enhance training stability and reduce redundancy. Extensive experiments are conducted on two representative benchmark datasets, BCI Competition IV-2b and CHB-MIT, covering both event-related cross-subject classification and continuous seizure prediction tasks. Results show that HCFT achieves 80.83% average accuracy and a Cohen's kappa of 0.6165 on BCI IV-2b, as well as 99.10% sensitivity, 0.0236 false positives per hour, and 98.82% specificity on CHB-MIT, consistently outperforming over ten state-of-the-art baseline methods. Ablation studies confirm that each core component of the proposed framework contributes significantly to the overall decoding performance, demonstrating HCFT's effectiveness in capturing EEG dynamics and its potential for real-world BCI applications.
研究动机与目标
- 推动鲁棒的 EEG 解码,捕捉微妙的时间节律、空间电极模式以及多尺度全局依赖。
- 提出 HCFT,将双分支 CNN 编码器与层次化 Transformer 块融合。
- 通过 Dynamic Tanh 归一化和基于跨注意力的特征对齐来提升训练稳定性。
提出的方法
- 双分支深度可分离卷积编码器提取时间和时空特征。
- 跨注意力机制在每一阶段对齐时间与时空特征。
- 层次化卷积融合 Transformer 块在多尺度上融合特征。
- Dynamic Tanh 归一化(DyT)作为 LayerNorm 的可选替代以稳定训练。
- 金字塔式多阶段编码器,阶段性池化与最终全局注意力后用于分类。
- 通过最终多头注意力、LayerNorm 或 DyT、全局平均池化和全连接头进行分类。
实验结果
研究问题
- RQ1如何在多尺度上有效对齐和融合时间与时空 EEG 特征?
- RQ2一个轻量级的双分支 CNN 结合 Transformer 融合是否能够在 MI 上实现强的跨主体泛化以及在癫痫预测上的鲁棒性?
- RQ3Dynamic Tanh 归一化是否提高对 EEG 任务的训练稳定性和泛化能力?
- RQ4每个核心 HCFT 组件对解码性能的贡献是什么?
主要发现
| Methods | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | Avg Acc | Std | Kappa |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ConvNet | 64.19 | 62.9 | 67.58 | 72.06 | 75.87 | 72.01 | 81.51 | 79.02 | 60.68 | 70.65 | 7.33 | 0.4134 |
| EEGNet | 66.15 | 71.08 | 72.01 | 56.48 | 80.24 | 78.78 | 85.03 | 79.54 | 71.74 | 73.45 | 8.64 | 0.4684 |
| MSNN | 74.72 | 65.29 | 57.63 | 91.21 | 74.72 | 85.55 | 72.91 | 76.57 | 76.66 | 75.02 | 9.88 | - |
| Hybrid s-CViT | 68.47 | 56.91 | 50.42 | 81.08 | 60.68 | 61.67 | 62.22 | 70.00 | 68.47 | 64.44 | 8.81 | - |
| Hybrid t-CViT | 66.39 | 55.74 | 52.36 | 82.7 | 72.57 | 63.89 | 68.89 | 65.92 | 72.64 | 66.79 | 9.12 | - |
| MSHCNN | 76.80 | 66.32 | 57.36 | 91.75 | 79.59 | 82.63 | 74.16 | 80.13 | 75.55 | 76.03 | 9.79 | - |
| Conformer | 65.89 | 64.43 | 67.45 | 84.45 | 72.24 | 76.56 | 77.86 | 69.23 | 74.87 | 76.4 | 6.51 | 0.4521 |
| EEGCCT | 68.75 | 59.6 | 59.9 | 89.21 | 73.44 | 75.39 | 76.3 | 75.76 | 77.73 | 73.26 | 9.21 | 0.4587 |
| Hybrid EEGNet | 71.53 | 65.00 | 58.75 | 84.86 | 78.78 | 77.50 | 77.92 | 73.68 | 75.41 | 73.72 | 7.82 | - |
| CTNet | 76.25 | 71.03 | 66.39 | 81.76 | 83.11 | 77.22 | 79.17 | 73.56 | 77.92 | 76.27 | 5.26 | 0.5252 |
| EEGPT | 72.22 | 69.71 | 61.53 | 78.78 | 81.08 | 70.42 | 83.89 | 83.82 | 70.83 | 74.70 | 7.61 | 0.4936 |
| SCNN | - | - | - | - | - | - | - | - | - | - | - | - |
| MSCFormer | 76.11 | 71.18 | 62.36 | 81.35 | 81.08 | 74.72 | 78.89 | 76.18 | 75.42 | 75.25 | 5.80 | 0.5051 |
| ConTraNet | 72.92 | 72.94 | 63.75 | 83.51 | 82.70 | 80.69 | 84.44 | 77.37 | 70.83 | 76.57 | 6.97 | - |
| HCFT | 78.62 | 73.23 | 67.71 | 93.92 | 82.72 | 82.68 | 86.17 | 84.47 | 77.94 | 80.83 | 7.61 | 0.6165 |
- HCFT 在 LOSO 条件下在 BCI IV-2b(MI 分类)上达到 80.83% 的平均准确率和 0.6165 的 Cohen’s kappa,优于 15 个基线。
- 在 CHB-MIT 癫痫预测任务上,HCFT 达到 99.10% 的灵敏度、每小时 0.0236 的假阳性率,特异性为 98.82%。
- 消融研究显示跨注意力、自注意力、阶段级拼接以及最终 MHSA 都对性能提升有贡献。
- DyT 归一化在 MI 任务上优于 LayerNorm,而 LayerNorm 在癫痫预测中表现更好,DyT 的模型规模和 FLOPs 更小。
- 嵌入维度与头数(D=32,H=2)在准确性与效率之间取得平衡;更深的 Stage 3 提升了性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。