QUICK REVIEW

[論文レビュー] HCFT: Hierarchical Convolutional Fusion Transformer for EEG Decoding

Haodong Zhang, Jiapeng Zhu|arXiv (Cornell University)|Jan 18, 2026

EEG and Brain-Computer Interfaces被引用数 0

ひとこと要約

HCFTは交差アテンションと階層的Transformer融合を備えた軽量なデュアルブランチ畳み込みエンコーダを導入し、MI分類（BCI IV-2b）と発作予測（CHB-MIT）で最先端の結果を達成します。

ABSTRACT

Electroencephalography (EEG) decoding requires models that can effectively extract and integrate complex temporal, spectral, and spatial features from multichannel signals. To address this challenge, we propose a lightweight and generalizable decoding framework named Hierarchical Convolutional Fusion Transformer (HCFT), which combines dual-branch convolutional encoders and hierarchical Transformer blocks for multi-scale EEG representation learning. Specifically, the model first captures local temporal and spatiotemporal dynamics through time-domain and time-space convolutional branches, and then aligns these features via a cross-attention mechanism that enables interaction between branches at each stage. Subsequently, a hierarchical Transformer fusion structure is employed to encode global dependencies across all feature stages, while a customized Dynamic Tanh normalization module is introduced to replace traditional Layer Normalization in order to enhance training stability and reduce redundancy. Extensive experiments are conducted on two representative benchmark datasets, BCI Competition IV-2b and CHB-MIT, covering both event-related cross-subject classification and continuous seizure prediction tasks. Results show that HCFT achieves 80.83% average accuracy and a Cohen's kappa of 0.6165 on BCI IV-2b, as well as 99.10% sensitivity, 0.0236 false positives per hour, and 98.82% specificity on CHB-MIT, consistently outperforming over ten state-of-the-art baseline methods. Ablation studies confirm that each core component of the proposed framework contributes significantly to the overall decoding performance, demonstrating HCFT's effectiveness in capturing EEG dynamics and its potential for real-world BCI applications.

研究の動機と目的

微細な時間リズム、空間電極パターン、およびマルチスケールのグローバル依存性を捉える頑健なEEGデコードを動機づける。
HCFTを提案し、デュアルブランチCNNエンコーダと階層的Transformerブロックを融合する。
Dynamic Tanh正規化とクロスアテンションベースの特徴整合性でトレーニングの安定性を向上させる。

提案手法

デュアルブランチの深さ別分離畳み込みエンコーダが時間的および時空間的特徴を抽出する。
クロスアテンション機構が各段階で時間的および時空間的特徴を整列させる。
階層的畳み込み融合Transformerブロックが複数スケールにわたって特徴を統合する。
Dynamic Tanh正規化（DyT）をLayerNormの代替としてオプションで導入し、トレーニングを安定化させる。
段階ごとのプーリングと最終的なグローバルアテンションを含むピラミッド型のマルチステージエンコーダ。
最終的なマルチヘッドアテンション、LayerNormまたはDyT、グローバル平均プーリング、および全結合ヘッドによる分類。

実験結果

リサーチクエスチョン

RQ1時間的および時空間的EEG特徴を複数のスケールでどのように効果的に整列・融合できるか？
RQ2軽量なデュアルブランチCNNとTransformer融合でMIにおけるクロスサブジェクト一般化と発作予測の頑健性を達成できるか？
RQ3Dynamic Tanh正規化はEEGタスク全般のトレーニング安定性と一般化を改善するか？
RQ4HCFTの各コアコンポーネントのデコード性能への寄与はどれか？

主な発見

Methods	S1	S2	S3	S4	S5	S6	S7	S8	S9	Avg Acc	Std	Kappa
ConvNet	64.19	62.9	67.58	72.06	75.87	72.01	81.51	79.02	60.68	70.65	7.33	0.4134
EEGNet	66.15	71.08	72.01	56.48	80.24	78.78	85.03	79.54	71.74	73.45	8.64	0.4684
MSNN	74.72	65.29	57.63	91.21	74.72	85.55	72.91	76.57	76.66	75.02	9.88	-
Hybrid s-CViT	68.47	56.91	50.42	81.08	60.68	61.67	62.22	70.00	68.47	64.44	8.81	-
Hybrid t-CViT	66.39	55.74	52.36	82.7	72.57	63.89	68.89	65.92	72.64	66.79	9.12	-
MSHCNN	76.80	66.32	57.36	91.75	79.59	82.63	74.16	80.13	75.55	76.03	9.79	-
Conformer	65.89	64.43	67.45	84.45	72.24	76.56	77.86	69.23	74.87	76.4	6.51	0.4521
EEGCCT	68.75	59.6	59.9	89.21	73.44	75.39	76.3	75.76	77.73	73.26	9.21	0.4587
Hybrid EEGNet	71.53	65.00	58.75	84.86	78.78	77.50	77.92	73.68	75.41	73.72	7.82	-
CTNet	76.25	71.03	66.39	81.76	83.11	77.22	79.17	73.56	77.92	76.27	5.26	0.5252
EEGPT	72.22	69.71	61.53	78.78	81.08	70.42	83.89	83.82	70.83	74.70	7.61	0.4936
SCNN	-	-	-	-	-	-	-	-	-	-	-	-
MSCFormer	76.11	71.18	62.36	81.35	81.08	74.72	78.89	76.18	75.42	75.25	5.80	0.5051
ConTraNet	72.92	72.94	63.75	83.51	82.70	80.69	84.44	77.37	70.83	76.57	6.97	-
HCFT	78.62	73.23	67.71	93.92	82.72	82.68	86.17	84.47	77.94	80.83	7.61	0.6165

HCFTはLOS0でBCI IV-2b（MI分類）において平均精度80.83%、コーエンのκ0.6165を達成し、15件のベースラインを上回った。
CHB-MIT発作予測では、HCFTは感度99.10%、1時間あたりの偽陽性0.0236、特異度98.82%を達成。
アブレーション研究により、クロスアテンション、自己注意、段階ごとの結合、および最終MHSAがすべて性能向上に寄与することが示された。
DyT正規化はMIタスクの性能をLayerNormより向上させる一方、発作予測ではLayerNormがより良い性能を示し、DyTはモデルサイズとFLOPsを小さくする。
埋め込み次元とヘッド数（D=32、H=2）は精度と効率のバランスを取り、より深いStage 3は性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。