QUICK REVIEW

[论文解读] Scaling laws for language encoding models in fMRI

Richard Antonello, Aditya R. Vaidya|PubMed|May 19, 2023

Machine Learning in Materials Science被引用 28

一句话总结

本文测试更大的开源语言模型和声学模型是否能更好地预测对自然语言的 fMRI 响应，发现随模型规模和数据呈对数增长，在某些脑区接近噪声天花板。

ABSTRACT

Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.

研究动机与目标

研究语言模型的缩放规律是否也扩展到通过 fMRI 测量的大脑响应的编码。
评估 OPT 和 LLaMA 语言模型在多种规模下的编码性能。
评估声学模型（HuBERT, WavLM, Whisper）在预测 fMRI 响应方面的缩放规律。
评估训练数据量（故事数量）如何影响编码性能。
描述编码性能在不同脑区接近噪声天花板的程度。

提出的方法

提取来自多种规模的解码器仅变压器语言模型（OPT 和 LLaMA）的上下文嵌入。
为声学编码提取来自音频模型（Whisper, HuBERT, WavLM）的嵌入。
使用带时间延迟的线性岭回归将词级模型激活与 fMRI BOLD 响应对齐，以考虑血流动力学反应。
使用动态上下文窗口高效计算长输入的隐藏状态。
应用堆叠回归方法将语义和声学特征结合以改善编码。
计算噪声天花板以量化可解释方差并归一化相关性。

实验结果

研究问题

RQ1来自 OPT 和 LLaMA 的更大语言模型是否比更小的模型提供更好的 fMRI 编码性能？
RQ2对语言和声学模型，编码性能如何随训练数据量（故事数量）的增加而缩放？
RQ3在大脑编码中，音频模型（HuBERT, WavLM, Whisper）是否遵循与语言模型类似的缩放规律？
RQ4编码模型在不同脑区接近噪声天花板的程度如何？
RQ5将语义特征和声学特征进行堆叠是否可提高听觉皮层及相关区域的编码？

主要发现

对 OPT 和 LLaMA 的语言模型，编码性能大致随模型规模呈对数增长，在所测试范围内从小模型到大模型约提升 15%。
编码性能也随训练故事数量呈对数增长，对于 OPT-125M，数据量每增加一个数量级，大约增加 122%。
听觉皮层及更高级听觉区域在更大声学模型下显示显著改进，Whisper 的结果指示出与层相关的强性能增益。
LLaMA 模型在编码上略优于 OPT 模型，LLaMA 的峰值性能出现在早期层，而 OPT 则在后期层。
噪声天花板分析显示，某些区域（楔前叶 precuneus 和更高级听觉皮层）接近最优，而像角回（angular gyrus）和前额叶部分区域仍有改进空间。
将 Whisper 与 LLaMA 结合的堆叠回归在听觉皮层编码方面相较仅语义建模还可进一步提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。