QUICK REVIEW

[论文解读] Brain decoding: toward real-time reconstruction of visual perception

Yohann Benchetrit, Hubert Banville|arXiv (Cornell University)|Oct 18, 2023

Functional Brain Connectivity Studies被引用 27

一句话总结

本文提出一个基于实时MEG的管线，通过将MEG信号与预训练的图像嵌入对齐并对扩散式生成器进行条件控制，从脑活动中解码并生成视觉意象。

ABSTRACT

In the past five years, the use of generative and foundational AI systems has greatly improved the decoding of brain activity. Visual perception, in particular, can now be decoded from functional Magnetic Resonance Imaging (fMRI) with remarkable fidelity. This neuroimaging technique, however, suffers from a limited temporal resolution ($\approx$0.5 Hz) and thus fundamentally constrains its real-time usage. Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution ($\approx$5,000 Hz). For this, we develop an MEG decoding model trained with both contrastive and regression objectives and consisting of three modules: i) pretrained embeddings obtained from the image, ii) an MEG module trained end-to-end and iii) a pretrained image generator. Our results are threefold: Firstly, our MEG decoder shows a 7X improvement of image-retrieval over classic linear decoders. Second, late brain responses to images are best decoded with DINOv2, a recent foundational image model. Third, image retrievals and generations both suggest that high-level visual features can be decoded from MEG signals, although the same approach applied to 7T fMRI also recovers better low-level features. Overall, these results, while preliminary, provide an important step towards the decoding -- in real-time -- of the visual processes continuously unfolding within the human brain.

研究动机与目标

研究具时间信息丰富的MEG数据在实时解码视觉感知方面的可行性。
利用预训练的图像嵌入将MEG信号映射到视觉表示。
开发一个三模块管线，使MEG能够实现图像检索和图像生成。
将MEG解码性能与fMRI基准进行比较并评估所表示特征的性质。

提出的方法

训练一个脑部模块f_theta，将MEG窗口映射到潜在的图像表示z。
使用CLIP损失优化检索，并使用MSE损失实现基于潜在表示的图像生成。
通过池化/仿射/注意力机制聚合时序MEG输出，生成固定大小的潜在表示。
将预训练的图像生成器（基于扩散的）以MEG派生的嵌入进行条件控制，以重建图像。
使用检索指标（top-5准确率、相对中位秩）和生成指标（PixCorr、SSIM、SwAV、CLIP等）进行评估。
在THINGS-MEG数据集上进行跨参与者训练和跨标准比较的测试。

实验结果

研究问题

RQ1是否可以实时解码MEG信号，以使用预训练的视觉嵌入来检索或生成开放集合图像？
RQ2哪种预训练的图像表示（有监督、文本对齐、自监督）最能与MEG活动在检索方面对齐？
RQ3在解码过程中，MEG信号在多大程度上保留高层语义与低层视觉特征？
RQ4在保真度和粒度方面，基于MEG的重建与基于fMRI的重建相比如何？
RQ5在图像出现与消失周围，解码性能的时序动态如何？

主要发现

数据集	PixCorr	SSIM	AlexNet(2)	AlexNet(5)	Inception	CLIP	SwAV
NSD (fMRI)	0.305	0.366	0.962	0.977	0.910	0.917	0.410
THINGS-MEG (per-trial average)	0.079	0.329	0.718	0.823	0.674	0.765	0.595
THINGS-MEG (per-subject average)	0.088	0.333	0.747	0.855	0.712	0.804	0.576
THINGS-MEG (no average)	0.069	0.308	0.668	0.733	0.613	0.668	0.636

深度MEG解码在图像检索方面相对于线性基线实现了最多约7倍的提升。
在VGG-19、CLIP-Vision、和DINOv2嵌入下，小型测试集的Top-5检索准确率约为70%。
从检索到生成的转变显示，所生成的图像捕捉了类别语义，但低级细节受限，表明MEG承载的高层次特征要强于细粒度细节。
时域窗口显示在图像出现和消失附近检索性能达到峰值，其中对消失事件的表征在DINOv2上尤为强烈。
与7T fMRI相比，MEG在低层特征的可恢复性较低，表明MEG的时间分辨率很高，但空间精度限制了低层重建。
该方法展示了从脑活动进行实时、开放集合视觉解码的路径。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。