QUICK REVIEW

[论文解读] Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva, Leila Wehbe|arXiv (Cornell University)|May 28, 2019

Topic Modeling参考文献 45被引用 142

一句话总结

本论文提出将神经网络词表示与人脑活动对齐，以解释NLP模型（ELMo、USE、BERT、Transformer-XL）并显示脑对齐修改可改善句法理解。它分析了上下文长度、层深和注意力对脑可预测性的影响，并证明从脑引导的改变到NLP任务的迁移。

ABSTRACT

Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type. Our results reveal differences in the context-related representations across these models. Further, in the transformer models, we find an interaction between layer depth and context length, and between layer depth and attention type. We finally hypothesize that altering BERT to better align with brain recordings would enable it to also better understand language. Probing the altered BERT using syntactic NLP tasks reveals that the model with increased brain-alignment outperforms the original model. Cognitive neuroscientists have already begun using NLP networks to study the brain, and this work closes the loop to allow the interaction between NLP and cognitive neuroscience to be a true cross-pollination.

研究动机与目标

以在自然阅读情境中的人脑活动为基础，推动对神经NLP表示的解释。
开发一种数据驱动的方法，将网络表示与fMRI/MEG数据对齐，以评估模型编码了什么。
在脑科术语层面，比较四个模型（ELMo、USE、BERT、T-XL）的单词和上下文长度表示。
识别上下文长度、层深和注意力类型如何影响跨模型的脑对齐可预测性。
表明对BERT的脑对齐修改可以转移到改进的句法任务性能。

提出的方法

从同一文本和单词窗口提取来自四个NLP模型（ELMo、BERT、USE、T-XL）的中间层表示 x_{l,k}。
拟合带岭回归正则化的线性编码模型，以从 x_{l,k} 预测 MEG/fMRI 活动并评估预测精度。
使用四折交叉验证和保留测试方案，通过跨体素和传感器的单词集合分类任务来评估脑可预测性。
基于先前文献，将大脑语言网络分为两组（group 1 和 group 2），以解释表示对齐的位置。
通过比较1词嵌入与多词表示（例如10词）来研究上下文长度的影响，并分析逐层效应。
改变 BERT 的注意力模式（某一层的统一注意力）以评估脑可预测性的变化以及向NLP句法任务的迁移。
在 Marvin & Linzen 句法任务上评估改动后的 BERT，以测试句法理解而不进行微调。

实验结果

研究问题

RQ1在自然阅读过程中，ELMo、BERT、USE 和 Transformer-XL 的中间表示与大脑活动如何对齐？
RQ2层深、上下文长度和注意力类型如何影响这些模型的脑对齐可预测性？
RQ3对 BERT 的脑对齐修改能否在不进行额外训练的情况下提高其在探测任务上的句法理解？

主要发现

中间的 Transformer 层比其他层在跨模型上更能预测脑活动。
Transformer-XL 的性能在更长上下文下并不退化，与其他模型不同。
在浅层 BERT 层中的统一注意力提高了对最多25词上下文的脑可预测性；深层受此改变影响较大。
将 BERT 修改为在浅层移除预训练注意力可改善与脑数据的一致性，并在句法探测任务上获得更好表现。
ELMo、BERT 和 T-XL 的长程表示可预测 group 1 和 group 2 大脑区域的活动，而 USE 主要预测长程信息以及较少的 group-1 区域。
在所有模型中，中间层最优地整合超过15个词的上下文；BERT 的 layer1 以不同方式组合标记嵌入，影响上下文保持。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。