QUICK REVIEW

[논문 리뷰] Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Yikang Shen, Shawn W. Tan|arXiv (Cornell University)|2018. 10. 22.

Natural Language Processing Techniques인용 수 158

한 줄 요약

ON-LSTM은 cumax 기반 게이팅 메커니즘으로 뉴런을 계층적으로 구성하는 inductive bias를 도입하여 순환 신경망 내에서 트리 형태의 구성 구성을 가능하게 합니다. 이는 언어 모델링과 비지도 구문 분석을 개선하고, 장기 범위의 구문적 및 논리적 추론 작업을 향상합니다.

ABSTRACT

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

연구 동기 및 목표

언어에서 잠재적 계층(구성) 구조 학습에 대해 편향 마련.
정렬된 게이팅을 통해 뉴런을 장기 정보 대 단기 정보에 할당하는 뉴럴 유닛 제안.
monotonic master gates와 구조적 업데이트를 강제하기 위해 cumax 활성화를 활용한 ON-LSTM 개발.
언어 모델링, 비지도 구문 분석, 표적 구문 평가, 논리 추론에서의 개선 점 시연.

제안 방법

cumax()를 cummax(softmax(...))로 도입하고 그다음 cumsum을 적용하여 뉴런 블록 간의 소프트 이진 게이트 분할을 가능하게 합니다.
높은 수준의 업데이트 granularity를 결정적으로 제어하기 위해 cumax를 사용한 master forget 및 master input 게이트를 정의합니다.
master 게이트와 표준 LSTM 게이트의 조합으로 업데이트된 셀 상태 c_t를 계산하여 계층적 정보 유지를 가능하게 합니다.
전하를 줄이기 위해 master 게이트를 D_m = D/C 차원의 벡터로 축소하여 청크 간 게이팅 공유 및 파라미터 감소를 달성합니다.
PTB에서 언어 모델링을 위한 3-layer ON-LSTM을 학습하고 perplexity로 평가; 기대 분할점으로 비지도 구문 분석을 위해 잠재 트리를 도출; 구문 작업 및 논리 추론 데이터셋에서 평가.

실험 결과

연구 질문

RQ1뉴런 업데이트의 계층화를 강제하는 inductive bias가 RNN에서 잠재 트리 구조 표현 학습을 개선할 수 있는가?
RQ2cumax 기반 구조적 게이팅이 감독된 트리 없이도 장기 의존성 모델링과 구성 구문 분석을 더 잘 가능하게 하는가?
RQ3ON-LSTM은 표준 LSTM에 비해 언어 모델링, 비지도 구문 분석, 표적 구문 평가, 논리 추론에서 어떤 성능을 보이는가?

주요 결과

ON-LSTM은 유사한 용량의 표준 LSTM보다 더 나은 언어 모델링 perplexities를 달성합니다(PTB, 3-layer 모델: 25M 파라미터; Validation 58.29±0.10, Test 56.17±0.12).
비지도 구성 구문 분석에서 ON-LSTM의 두 번째 층이 WSJ 테스트에서 최상위 F1 점수를 달성하며 보고된 층 중 최고를 보였습니다.
ON-LSTM은 더 긴 시퀀스에 대한 일반화와 표적 구문 평가에서 장기 의존성에 대한 성능이 향상되었습니다.
ON-LSTM은 표준 LSTM에 비해 더 긴 시퀀스에서 논리 추론 과제에서 우수한 성능을 보여 구조화된 데이터 처리에 대한 능력이 더 좋음을 시사합니다.
모델의 inductive bias는 인간의 구문 구조와 일치하며 계층적 표현이 필요한 다운스트림 작업에 잠재적 이점을 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.