QUICK REVIEW

[논문 리뷰] Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Sandeep Subramanian, Adam Trischler|arXiv (Cornell University)|2018. 03. 30.

Topic Modeling인용 수 127

한 줄 요약

이 논문은 다양한 문장 수준 작업(다국어 NMT, 구성구문 분석, skip-thought, 자연어 추론)을 가로지르는 단일 순환 인코더를 공유하는 다중 작업 학습 프레임워크를 도입하여 새로운 작업과 데이터가 부족한 설정으로도 잘 전달되는 범용 고정 길이 문장 표현을 생성한다.

ABSTRACT

A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general purpose features for words across a range of NLP problems. However, extending this success to learning representations of sequences of words, such as sentences, remains an open problem. Recent work has explored unsupervised as well as supervised learning techniques with different training objectives to learn general purpose fixed-length sentence representations. In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model. We train this model on several data sources with multiple training objectives on over 100 million sentences. Extensive experiments demonstrate that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. We present substantial improvements in the context of transfer learning and low-resource settings using our learned general-purpose representations.

연구 동기 및 목표

단어 임베딩을 넘어서는 범용 문장 표현의 필요성을 동기 부여한다.
다양한 문장 수준 학습 목표를 결합한 간단하고 확장 가능한 다중 작업 프레임워크를 제안한다.
약하게 연관된 작업 간의 공유 인코딩이 전달 성능과 저자원 학습을 향상시킨다는 것을 보여준다.

제안 방법

공유 양방향 GRU 인코더와 작업별 디코더를 갖춘 일대다 시퀀스-투-시퀀스 모델을 사용한다.
다양한 목표로 학습한다: skip-thought vectors, multilingual neural machine translation (NMT), constituency parsing, and natural language inference (NLI).
주의(attention)를 사용하지 않고 인코더 표현 h_x에 대해 디코더를 조건화하여 단일 고정 길이 문장 임베딩을 가능하게 한다.
학습 중에 작업을 인터리브한다(균등한 작업 샘플링; 가끔 NLI 미니배치). Adam으로 최적화한다.
대표 벡터를 평가하기 위해 인코더 매개변수를 업데이트하지 않고 전달 작업에서 간단한 선형 분류기를 훈련한다.

실험 결과

연구 질문

RQ1단일 공유 인코더가 다중 문장 수준 작업에서 학습될 때 task-specific 또는 단일 목표 모델보다 더 일반화된 표현을 학습하는가?
RQ2다양한 작업에서 얻은 귀납적 편향이 전달 성능을 향상시키고 특히 저자원 설정에서 도움이 되는가?
RQ3구문, 의미 또는 기타 문장 특징을 포착하는 데 가장 큰 기여를 하는 작업은 무엇인가?
RQ4고정 길이 표현은 전달 작업에서 어텐션 기반 표현이나 작업별 표현에 비해 어떤 차이를 보이는가?

주요 결과

다중 작업 학습으로 학습된 표현은 여러 기존의 일반-purpose 방법보다 전달 작업 전반에 걸쳐 더 잘 일반화된다.
작업을 더 많이 추가하고 인코더 용량을 늘리면 감정 분석, 함의, 패러프레이즈 작업에서 일관된 전달 이득이 나타난다.
다중 작업 모델은 저자원 전달 성능을 향상시키며 일부 작업에서 약 6% 수준의 라벨 데이터만으로도 경쟁력 있는 결과를 달성한다.
구성 구문 분석과 다국어 NMT를 포함하면 임베딩에서 구문적 및 관련된 언어 신호가 강화된다.
모델의 학습 단어 임베딩은 초기부터 학습되었음에도 기존 임베딩 방법과 경쟁력이 있다.
프로빙 결과, 다중 작업 신호가 구문 인코딩에 기여하는 반면 NLI는 주로 의미 인코딩을 보조한다는 것을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.