QUICK REVIEW

[論文レビュー] ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters

Reese Kneeland, Wangshu Jiang|arXiv (Cornell University)|Feb 10, 2026

EEG and Brain-Computer Interfaces被引用数 0

ひとこと要約

ENIGMAは、新規被験者に最大15分でファインチューニングする多被験者EEG-to-imageモデルで、パラメータの<1%を使用し、THINGS-EEG2およびAlljoined-1.6Mの両方で最先端の再構成を達成し、一般消費者向けハードウェアでも堅牢な性能を示します。

ABSTRACT

To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small enough to run locally on accessible computing resources. To directly address these current limitations, we introduce ENIGMA, a multi-subject electroencephalography (EEG)-to-Image decoding model that reconstructs seen images from EEG recordings and achieves state-of-the-art (SOTA) performance on the research-grade THINGS-EEG2 and consumer-grade AllJoined-1.6M benchmarks, while fine-tuning effectively on new subjects with as little as 15 minutes of data. ENIGMA boasts a simpler architecture and requires less than 1% of the trainable parameters necessary for previous approaches. Our approach integrates a subject-unified spatio-temporal backbone along with a set of multi-subject latent alignment layers and an MLP projector to map raw EEG signals to a rich visual latent space. We evaluate our approach using a broad suite of image reconstruction metrics that have been standardized in the adjacent field of fMRI-to-Image research, and we describe the first EEG-to-Image study to conduct extensive behavioral evaluations of our reconstructions using human raters. Our simple and robust architecture provides a significant performance boost across both research-grade and consumer-grade EEG hardware, and a substantial improvement in fine-tuning efficiency and inference cost. Finally, we provide extensive ablations to determine the architectural choices most responsible for our performance gains in both single and multi-subject cases across multiple benchmark datasets. Collectively, our work provides a substantial step towards the development of practical brain-computer interface applications.

研究の動機と目的

実世界のBCIのための実用的なEEG-to-Imageデコーディングを、新規被験者への迅速なファインチューニングを可能にすることで address
研究用および消費者用EEGハードウェアの両方で堅牢な性能を達成
被験者間でパラメータを共有しつつデコード品質を維持してモデルサイズを削減
ヒューマンビヘイビア評価とアブレーションを含む包括的評価を提供
エッジデプロイと臨床利用のための広い適用性と効率性を実証

提案手法

スペイショ-temporalバックボーン、被験者別潜在アライメント層、およびCLIP埋め込み空間へのMLP投影を備えたマルチ被験者EEG-to-imageモデルENIGMAを提案
大部分のパラメータを共有する軽量な被験者特異的アライメントで統一されたマルチ被験者アーキテクチャを使用
EEG埋め込みをCLIP ViT-H/14潜在空間にマップし、IP-Adapter付きStable Diffusion XL Turboによって画像を再構成
EEG埋め込みと画像CLIP埋め込みのMSEとInfoNCEコントラスト項を組み合わせた複合損失で訓練
単一被験者モード、マルチ被験者モード、ファインチューニング被験者適応の三つの運用モードを許可
15分の較正実現性と潜在的なエッジデバイス展開を示し、訓練効率（例：30被験者で5.5時間）を報告

実験結果

リサーチクエスチョン

RQ1高品質EEGハードウェアと消費者用EEGハードウェアの両方で最先端の再構成を達成できる統一されたマルチ被験者EEG-to-imageモデルは可能か
RQ2被験者特異的潜在アライメントを伴う軽量でパラメータを共有するアーキテクチャは最小データで新規被験者への迅速な適応を可能にするか
RQ3既存のEEG-to-Imageベースラインに対するENIGMAのパフォーマンスは、標準的なベンチマーク（THINGS-EEG2とAlljoined-1.6M）で自動評価と人間評価の両方でどうか
RQ4アーキテクチャ要素（潜在アライメント、時空間バックボーン、拡散事前分布）が跨被験者一般化とハードウェア品質への堅牢性に与える影響は
RQ5ENIGMAは複数被験者へ拡張可能で、単一被験者モデルと比較してパラメータ効率が高いか

主な発見

Method	Model Properties	Low-Level	High-Level	Retrieval	Human Raters	# of Parameters	Inference GFLOPS	PixCorr	SSIM	Alex(2)	Alex(5)	Incep	CLIP	Eff
ENIGMA (Multi-Subject)	2,376,842	294.4	0.1668	0.4264	82.99%	89.12%	76.54%	80.33%	0.8577	0.5399	22.55%	50.75%	64.05%	86.04%
ATM-S (Multi-Subject)	12,815,311	3,858.6	0.072	0.403	57.09%	58.99%	52.86%	55.04%	0.963	0.663	16.20%	45.10%	62.20%	56.82%
ENIGMA (Single-Subject)	13,896,820	294.4	0.1718	0.4233	83.64%	89.49%	77.65%	81.48%	0.8547	0.5403	27.60%	59.35%	71.15%	86.82%
ATM-S (Single-Subject)	128,153,110	3,858.6	0.136	0.392	73.85%	80.83%	67.56%	71.28%	0.909	0.601	30.15%	60.15%	73.60%	77.14%
Perceptogram (Single-Subject)	4,731,924,800	2,807.8	0.247	0.431	85.46%	88.03%	70.40%	71.98%	0.902	0.581	–	–	–	79.17%
Alljoined-1.6M (Multi-Subject)	2,376,842	588.8	0.0852	0.4175	68.33%	73.40%	63.14%	66.38%	0.9259	0.6127	6.00%	18.85%	28.80%	70.74%

ENIGMAは複数の指標でTHINGS-EEG2およびAlljoined-1.6MでSOTAを達成し、潜在アライメントによる跨被験者一般化を示す
モデルは先行法が必要とする学習可能パラメータの<1%を使用し、マルチ被験者展開で約165倍のパラメータ削減を達成して30被験者へ拡張可能
新規被験者へのファインチューニングをわずか15分程度のデータで実現し、少データ領域で非 pretrainedベースラインを上回る
人間の行動評価では、ENIGMAの再構成が基準よりground truth画像を識別しやすいことを示す
アブレーション解析は潜在アライメントと時空間バックボーンがマルチ被験者性能に不可欠であることを示し、特定の拡散前方分布の成分は消費者用ハードウェアで性能を害する可能性がある
ベンチマーク全体で、ENIGMAは消費者向けEEGハードウェアで堅牢な性能を維持し、より複雑なアーキテクチャで観察される脆弱性を低減している

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。