QUICK REVIEW

[논문 리뷰] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

Urvashi Khandelwal, He He|arXiv (Cornell University)|2018. 05. 12.

Topic Modeling참고 문헌 15인용 수 62

한 줄 요약

이 논문은 LSTM 언어 모델이 맥락 정보를 어떻게 사용하는지 분석하여 약 200토큰의 효과적 맥 context를 발견하고, 인근 맥락은 마지막 문장 범위 내에서만 순서에 민감하며, 먼 맥락은 단어를 복사하기 위한 신경 캐시의 도움으로 대략적인 의미적 영역을 형성한다.

ABSTRACT

We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history. The model is highly sensitive to the order of words within the most recent sentence, but ignores word order in the long-range context (beyond 50 tokens), suggesting the distant past is modeled only as a rough semantic field or topic. We further find that the neural caching model (Grave et al., 2017b) especially helps the LSTM to copy words from within this distant context. Overall, our analysis not only provides a better understanding of how neural LMs use their context, but also sheds light on recent success from cache-based models.

연구 동기 및 목표

LSTM LMs가 효과적으로 사용하는 prior context 토큰 수를 결정한다.
LSTM LMs에서 인근 맥락과 장거리 맥락이 어떻게 표현되는지 구분한다.
다양한 맥락 영역에서 단어 순서와 단어 정체성이 미치는 영향을 평가한다.
신경 캐시 복사 메커니즘이 먼 맥락을 활용하는 데 어떤 도움을 주는지 평가한다.

제안 방법

테스트 시 prior context를 자르기, 섞기, 대체, 제거하는 절단(ablation) 실험을 수행한다.
신경 캐시 여부에 관계없이 PTB와 WikiText-2로 학습된 표준 LSTM LM을 사용한다.
교대 변화에 따른 당혹도(perplexity/NLL)를 비교한다.
맥락 의존성을 보기 위해 단어 유형(내용어 vs. 기능어)과 품사 카테고리를 분석한다.
근거리 대 먼 거리 맥락에서의 복사에 대한 효과를 측정하기 위해 신경 캐시를 도입한다.

실험 결과

연구 질문

RQ1신경 LM이 효과적으로 사용하는 prior 맥락의 양은 얼마인가(토큰 단위로)?
RQ2인근 맥락과 장거리 맥 context가 LSTM의 예측에 다르게 기여하는가?
RQ3가까운 맥락과 먼 맥 context에서 단어 순서가 예측에 어떤 영향을 미치는가?
RQ4복사 메커니즘(신경 캐시)이 먼 맥 context를 더 효과적으로 활용하는 데 도움이 되는가?

주요 결과

데이터셋	# 토큰 (Dev)	# 토큰 (Test)	평균 문장 길이 (Dev)	평균 문장 길이 (Test)	Perplexity (캐시 없음) Dev	Perplexity (캐시 없음) Test	Perplexity (캐시 없음) Dev	Perplexity (캐시 없음) Test
PTB	73,760	82,430	20.9	20.9	59.07	56.89	59.07	56.89
Wiki	217,646	245,569	23.7	22.6	67.29	64.51	67.29	64.51

LSTMs는 평균적으로 약 200 토큰의 맥락을 효과적으로 사용한다(PTB 및 WikiText-2).
가장 최근 약 20토큰 이내에서만 단어 순서가 중요하며, 약 50토큰 이후에는 전역 단어 순서 효과가 사라져 먼 단어들에 대해 대략적인 의미적 표현이 형성된다.
내용어가 기능어보다 더 많은 맥 context를 필요로 하고, 드문 단어일수록 더 많은 맥 context가 필요하다.
신경 캐시는 특히 먼 거리 맥 context에서 복사를 크게 개선하며, 멀리서만 복사될 수 있는 단어들의 경우에도 효과를 보이고, 때로는 과거에 없던 단어를 가진 경우에는 성능에 해를 줄 수 있다.
대상 단어를 다른 토큰으로 대체하는 것이 인근 맥 context 복사 단어들에 대해 드롭하는 것보다 더 큰 악영향을 주며, 이는 근거리 맥 context 복사가 정확한 출현에 의존한다는 것을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.