QUICK REVIEW

[논문 리뷰] HCFT: Hierarchical Convolutional Fusion Transformer for EEG Decoding

Haodong Zhang, Jiapeng Zhu|arXiv (Cornell University)|2026. 01. 18.

EEG and Brain-Computer Interfaces인용 수 0

한 줄 요약

HCFT는 cross-attention과 계층적 Transformer 융합을 갖춘 경량 이중 분기 컨볼루션 인코더를 도입하여 EEG 디코딩을 수행하고 MI 분류(BCI IV-2b) 및 발작 예측(CHB-MIT)에서 최첨단 결과를 달성한다.

ABSTRACT

Electroencephalography (EEG) decoding requires models that can effectively extract and integrate complex temporal, spectral, and spatial features from multichannel signals. To address this challenge, we propose a lightweight and generalizable decoding framework named Hierarchical Convolutional Fusion Transformer (HCFT), which combines dual-branch convolutional encoders and hierarchical Transformer blocks for multi-scale EEG representation learning. Specifically, the model first captures local temporal and spatiotemporal dynamics through time-domain and time-space convolutional branches, and then aligns these features via a cross-attention mechanism that enables interaction between branches at each stage. Subsequently, a hierarchical Transformer fusion structure is employed to encode global dependencies across all feature stages, while a customized Dynamic Tanh normalization module is introduced to replace traditional Layer Normalization in order to enhance training stability and reduce redundancy. Extensive experiments are conducted on two representative benchmark datasets, BCI Competition IV-2b and CHB-MIT, covering both event-related cross-subject classification and continuous seizure prediction tasks. Results show that HCFT achieves 80.83% average accuracy and a Cohen's kappa of 0.6165 on BCI IV-2b, as well as 99.10% sensitivity, 0.0236 false positives per hour, and 98.82% specificity on CHB-MIT, consistently outperforming over ten state-of-the-art baseline methods. Ablation studies confirm that each core component of the proposed framework contributes significantly to the overall decoding performance, demonstrating HCFT's effectiveness in capturing EEG dynamics and its potential for real-world BCI applications.

연구 동기 및 목표

강건한 EEG 디코딩을 촉진하여 미세한 시간 리듬, 공간 전극 패턴, 그리고 다중 스케일 글로벌 의존성을 포착한다.
HCFT를 제안하여 이중 분기 CNN 인코더와 계층적 Transformer 블록을 융합한다.
Dynamic Tanh 정규화와 cross-attention 기반 특징 정렬로 학습 안정성을 향상시킨다.

제안 방법

이중 분기 Depthwise Separable 컨볼루션 인코더가 시간 및 시공간 특징을 추출한다.
Cross-attention 메커니즘이 각 단계에서 시간 및 시공간 특징을 정렬한다.
다중 스케일 간 특징을 융합하는 계층적 합성 변환기(Hierarchical Convolutional Fusion Transformer) 블록.
Dynamic Tanh 정규화(DyT)를 LayerNorm의 대안으로 사용하여 학습을 안정화한다.
단계별 풀링과 분류 전의 최종 글로벌 어텐션을 갖춘 피라미드식 다단계 인코더.
최종 다중 헤드 어텐션, LayerNorm 또는 DyT, 글로벌 평균 풀링, 그리고 완전 연결 헤드를 통한 분류.

실험 결과

연구 질문

RQ1다중 스케일에 걸쳐 시간 및 시공간 EEG 특징을 효과적으로 정렬하고 융합할 수 있는 방법은 무엇인가?
RQ2경량 이중 분기 CNN과 Transformer 융합으로 MI에서 강한 교차 피험자 일반화 및 발작 예측의 견고성을 달성할 수 있는가?
RQ3Dynamic Tanh 정규화가 EEG 과제에서 학습 안정성과 일반화를 향상시키는가?
RQ4HCFT의 핵심 구성 요소 각각이 디코딩 성능에 어떤 기여를 하는가?

주요 결과

Methods	S1	S2	S3	S4	S5	S6	S7	S8	S9	Avg Acc	Std	Kappa
ConvNet	64.19	62.9	67.58	72.06	75.87	72.01	81.51	79.02	60.68	70.65	7.33	0.4134
EEGNet	66.15	71.08	72.01	56.48	80.24	78.78	85.03	79.54	71.74	73.45	8.64	0.4684
MSNN	74.72	65.29	57.63	91.21	74.72	85.55	72.91	76.57	76.66	75.02	9.88	-
Hybrid s-CViT	68.47	56.91	50.42	81.08	60.68	61.67	62.22	70.00	68.47	64.44	8.81	-
Hybrid t-CViT	66.39	55.74	52.36	82.7	72.57	63.89	68.89	65.92	72.64	66.79	9.12	-
MSHCNN	76.80	66.32	57.36	91.75	79.59	82.63	74.16	80.13	75.55	76.03	9.79	-
Conformer	65.89	64.43	67.45	84.45	72.24	76.56	77.86	69.23	74.87	76.4	6.51	0.4521
EEGCCT	68.75	59.6	59.9	89.21	73.44	75.39	76.3	75.76	77.73	73.26	9.21	0.4587
Hybrid EEGNet	71.53	65.00	58.75	84.86	78.78	77.50	77.92	73.68	75.41	73.72	7.82	-
CTNet	76.25	71.03	66.39	81.76	83.11	77.22	79.17	73.56	77.92	76.27	5.26	0.5252
EEGPT	72.22	69.71	61.53	78.78	81.08	70.42	83.89	83.82	70.83	74.70	7.61	0.4936
SCNN	-	-	-	-	-	-	-	-	-	-	-	-
MSCFormer	76.11	71.18	62.36	81.35	81.08	74.72	78.89	76.18	75.42	75.25	5.80	0.5051
ConTraNet	72.92	72.94	63.75	83.51	82.70	80.69	84.44	77.37	70.83	76.57	6.97	-
HCFT	78.62	73.23	67.71	93.92	82.72	82.68	86.17	84.47	77.94	80.83	7.61	0.6165

HCFT는 LOSO 하에서 BCI IV-2b(MI 분류)에서 평균 정확도 80.83%, Cohen의 kappa 0.6165를 달성하며 15개의 벤치마크를 능가한다.
CHB-MIT 발작 예측에서 HCFT는 민감도 99.10%, 시간당 거짓 양성 0.0236, 특이도 98.82%를 달성한다.
절단 연구에서 cross-attention, self-attention, stage-wise concatenation, 및 최종 MHSA가 성능 향상에 기여한다.
DyT 정규화는 MI 과제 성능을 LayerNorm보다 향상시키는 반면, 발작 예측에는 LayerNorm이 더 좋으며 DyT는 작은 모델 크기와 FLOPs를 제공한다.
임베딩 차원 및 헤드 수(D=32, H=2)가 정확도와 효율성의 균형을 이룬다; 더 깊은 Stage 3가 성능을 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.