QUICK REVIEW

[논문 리뷰] Distance-based Self-Attention Network for Natural Language Inference

Jinbae Im, Sungzoon Cho|arXiv (Cornell University)|2017. 12. 06.

Topic Modeling참고 문헌 27인용 수 69

한 줄 요약

Distance-based Self-Attention Network를 소개하며, distance mask를 multi-head attention에 추가하여 국소 의존성을 포착하고 전역 맥락을 보존하는 방식으로 SNLI에서 최첨단 성과를 달성하고 MultiNLI에서 강력한 결과를 얻는다.

ABSTRACT

Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.

연구 동기 및 목표

자연어 추론을 위한 문장 인코더의 지역 단어 의존성 포착을 통해 개선을 모티베이션한다.
전역 맥락을 손실 없이 유지하면서 Word 거리 정보를 완전히 주의 기반 인코더에 통합한다.
제안된 거리 기반 주의를 SNLI 및 MultiNLI 데이터셋에서 평가한다.
거리 마스크가 주의에 어떤 영향을 주고 성능에 기여하는지 분석을 제공한다.

제안 방법

거리 간격(distance)를 모델링하기 위해 거리 마스크를 도입하여 Transformer 스타일 주의에 확장한다.
앞으로의 의존성 및 뒤로의 의존성을 인코딩하기 위해 방향성 마스크를 도입한다.
투사된 단어 임베딩과 마스크된 주의 출력의 융합 게이트를 도입한다.
융합 단계 뒤에 잔차 연결이 있는 위치별 피드포워드 네트워크를 사용한다.
다차원 자기 주의와 최대 풀링을 통해 문장 표현을 얻는 풀링을 적용한다.

실험 결과

연구 질문

RQ1자체 주의에 거리 마스크를 추가하는 것이 이전의 완전 주의 기반 인코더에 비해 자연어 추론 성능을 향상시키는가?
RQ2거리 마스크가 긴 문장과 짧은 문장의 주의 패턴에 어떤 영향을 미치는가?
RQ3거리 마스크가 SNLI와 MultiNLI 벤치마크에 미치는 영향은 무엇인가?
RQ4제안된 모델이 지역 의존성 포착과 글로벌 맥락 모델링의 균형을 어떻게 달성하는가?

주요 결과

거리 마스크를 완전 주의 기반 인코더와 함께 사용할 때 SNLI에서 최첨단 결과를 얻는다.
거리 마스크는 특히 긴 문장에서 성능을 향상시키며, 평균 문장 길이가 증가할수록 더 큰 이점을 보인다.
절단 연구에서 거리 마스크를 포함하는 것이 정확도를 향상시키지만 모델 크기나 학습 시간을 크게 증가시키지 않는다는 점이 확인되었다.
MultiNLI에서 모델은 경쟁력이 있으며, 더 깊은 LSTM 기반 모델에 비해 비교적 간단한 추론 계층으로도 강한 정확도를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.