QUICK REVIEW

[论文解读] Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding

Zijun Gong, Tianren Yao|arXiv (Cornell University)|Mar 20, 2026

EEG and Brain-Computer Interfaces被引用 0

一句话总结

JMVR 引入一个联合模态框架，将 EEG 和文本视为独立模态，从 EEG 信号重建高保真视觉效果，在 THINGS-EEG 上实现最先进的结果。

ABSTRACT

Human visual reconstruction aims to reconstruct fine-grained visual stimuli based on subject-provided descriptions and corresponding neural signals. As a widely adopted modality, Electroencephalography (EEG) captures rich visual cognition information, encompassing complex spatial relationships and chromatic details within scenes. However, current approaches are deeply coupled with an alignment framework that forces EEG features to align with text or image semantic representation. The dependency may condense the rich spatial and chromatic details in EEG that achieved mere conditioned image generation rather than high-fidelity visual reconstruction. To address this limitation, we propose a novel Joint-Modal Visual Reconstruction (JMVR) framework. It treats EEG and text as independent modalities for joint learning to preserve EEG-specific information for reconstruction. It further employs a multi-scale EEG encoding strategy to capture both fine- and coarse-grained features, alongside image augmentation to enhance the recovery of perceptual details. Extensive experiments on the THINGS-EEG dataset demonstrate that JMVR achieves SOTA performance against six baseline methods, specifically exhibiting superior capabilities in modeling spatial structure and chromatic fidelity.

研究动机与目标

促使从 EEG 信号实现超越文本对齐条件的高保真视觉重建。
将 EEG 表征与抽象文本/图像语义解耦以保留感知细节。
开发多尺度 EEG 编码器和图像增强以丰富联合潜在空间。
提出联合模态注意力机制以实现跨模态交互，而不强制将 EEG 映射到文本空间。
引入扩散步骤门控，在扩散步数中平衡语义和感知信息。

提出的方法

具有时空和金字塔池化分支的多尺度 EEG 编码器，以捕捉细粒度和粗粒度 EEG 特征。
图像增强，结合边缘图、饱和度和深度（通过 Depth-Anything-v2）以及 HSV 饱和度，以丰富视觉属性。
联合模态注意力，将图像、文本和 EEG 令牌连接并应用单一联合自注意力，具备模态特定投影和后续的模态逐项 MLP 残差。
扩散步骤门控，在扩散时间步中通过文本和 EEG 先验（文本：sin schedule，EEG：1 - sin schedule）调制信息流，以使粗粒度语义与细粒度感知线索对齐。

实验结果

研究问题

RQ1可以通过将 EEG 与文本/图像语义解耦来实现更高保真度的 EEG 基础视觉重建吗？
RQ2多尺度 EEG 表征和图像增强如何影响重建质量？
RQ3联合模态注意力策略是否相比传统的带有 EEG 预对齐的跨注意力在跨模态交互方面更丰富？
RQ4扩散步骤门控在生成过程中对平衡语义引导与感知 EEG 信息有何影响？

主要发现

在 THINGS-EEG 上，相对于六个基线，JMVR 在多项指标上实现最先进的性能。
消融实验表明多尺度 EEG 编码和扩散步骤门控对性能至关重要。
图像增强提升了细粒度保真度，若移除此模块会削弱颜色和深度属性。
联合模态注意力保持 EEG 的特异性，并在不将 EEG 强制映射到文本对齐空间的情况下实现模态之间的丰富交互。
时序分析表明 EEG 在后期阶段对深度和空间结构有贡献，而文本在扩散初期控制粗略结构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。