QUICK REVIEW

[论文解读] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

Urvashi Khandelwal, He He|arXiv (Cornell University)|May 12, 2018

Topic Modeling参考文献 15被引用 62

一句话总结

本文分析了 LSTM 语言模型如何利用上下文信息，发现大约 200 个 token 的有效上下文，近端上下文仅在最后一句内具有顺序敏感性，且远端上下文形成一个粗略的语义场，在复制单词时由神经缓存提供帮助。

ABSTRACT

We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history. The model is highly sensitive to the order of words within the most recent sentence, but ignores word order in the long-range context (beyond 50 tokens), suggesting the distant past is modeled only as a rough semantic field or topic. We further find that the neural caching model (Grave et al., 2017b) especially helps the LSTM to copy words from within this distant context. Overall, our analysis not only provides a better understanding of how neural LMs use their context, but also sheds light on recent success from cache-based models.

研究动机与目标

确定 LSTM LMs 有效使用的前序上下文的 token 数量。
区分近端与长程上下文在 LSTM LMs 中的表示方式。
评估上下文不同区域中词序与词性对预测的影响。
评估神经缓存复制机制如何帮助利用远距离上下文。

提出的方法

通过在测试时扰动前序上下文（截断、打乱、替换、删除）来进行消融测试。
使用在 PTB 和 WikiText-2 上训练的标准 LSTM LM，并有无神经缓存的对比。
在扰动下比较困惑度/NLL 的表现。
分析词型（实词 vs. 功能词）和词性类别以观察对上下文的依赖。
引入神经缓存以衡量其对从近距与远距上下文复制的影响。

实验结果

研究问题

RQ1神经语言模型在多大程度上使用前序上下文中的 token？
RQ2近端与长程上下文对 LSTMs 的预测是否有不同的贡献？
RQ3在近距离与远距离上下文中，词序如何影响预测？
RQ4复制机制（神经缓存）是否更有效地帮助利用远距离上下文？

主要发现

数据集	# Tokens (Dev)	# Tokens (Test)	Avg. Sent. Len (Dev)	Avg. Sent. Len (Test)	Perplexity (no cache) Dev	Perplexity (no cache) Test	Perplexity (no cache) Dev	Perplexity (no cache) Test
PTB	73,760	82,430	20.9	20.9	59.07	56.89	59.07	56.89
Wiki	217,646	245,569	23.7	22.6	67.29	64.51	67.29	64.51

LSTMs 在平均水平上有效使用大约 200 个上下文 token（PTB 和 WikiText-2）。
词序仅在最近的 ~20 个 token 内具有意义；大约 50 个 token 之后的全局词序效应消失，表明对远距离单词存在一种粗略的语义表示。
实词比功能词需要更多的上下文；稀有词比常见词需要更多的上下文。
神经缓存显著改善从远距离上下文复制的能力，特别是在只能从远距上下文复制的单词上，同时也有时会损害历史中不存在的单词。
用其他 token 替换目标词对近距离上下文复制的词影响大于直接删除它们，表明近距离上下文复制依赖于精确的出现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。