QUICK REVIEW

[논문 리뷰] Improving Relation Extraction by Pre-trained Language Representations

Christoph Alt, Marc P. Hübner|arXiv (Cornell University)|2019. 06. 07.

Topic Modeling참고 문헌 29인용 수 53

한 줄 요약

TRE는 Transformer 프레임워크 내에서 pre-trained language representations를 사용하여 relation extraction를 수행하며 TACRED와 SemEval 2010 Task 8에서 최첨단 결과를 달성하고 샘플 효율성이 향상됨을 보여준다.

ABSTRACT

Current state-of-the-art relation extraction methods typically rely on a set of lexical, syntactic, and semantic features, explicitly computed in a pre-processing step. Training feature extraction models requires additional annotated language resources, which severely restricts the applicability and portability of relation extraction to novel languages. Similarly, pre-processing introduces an additional source of error. To address these limitations, we introduce TRE, a Transformer for Relation Extraction, extending the OpenAI Generative Pre-trained Transformer [Radford et al., 2018]. Unlike previous relation extraction models, TRE uses pre-trained deep language representations instead of explicit linguistic features to inform the relation classification and combines it with the self-attentive Transformer architecture to effectively model long-range dependencies between entity mentions. TRE allows us to learn implicit linguistic features solely from plain text corpora by unsupervised pre-training, before fine-tuning the learned language representations on the relation extraction task. TRE obtains a new state-of-the-art result on the TACRED and SemEval 2010 Task 8 datasets, achieving a test F1 of 67.4 and 87.1, respectively. Furthermore, we observe a significant increase in sample efficiency. With only 20% of the training examples, TRE matches the performance of our baselines and our model trained from scratch on 100% of the TACRED dataset. We open-source our trained models, experiments, and source code.

연구 동기 및 목표

관계 추출을 위해 명시적 언어 특징 공학에 대한 의존도를 줄이려는 동기를 부여한다.
사전 학습된 언어 표현를 관계 분류에 사용하는 Transformer 기반 모델인 TRE를 소개한다.
표준 벤치마크에서 비지도 사전 학습이 성능과 샘플 효율성을 향상시킨다는 것을 보여준다.

제안 방법

관계 추출을 위한 구조화된 입력을 처리하기 위해 디코더 전용 Transformer 아키텍처를 사용한다.
관계 인수와 문장을 인코딩하기 위해 BPE 서브워드 토큰과 작업 특화 구분자를 포함하는 입력 표현을 채택한다.
일반 텍스트로 언어 모델링 목표에 대해 모델을 사전 학습한 뒤, fine-tuning 중 보조 LM 목표를 사용하여 관계 추출에서 미세조정한다.
최종 Transformer 상태에서 선형 소프트맥스 분류기를 사용하여 관계 라벨을 예측함으로써 미세조정하고, 필요에 따라 LM 목표의 가중치(lambda)를 조정한다.
일반화 및 정규화 효과를 연구하기 위해 엔티티 마스킹 전략(UNK, NE, GR, NE+GR)을 실험한다.

실험 결과

연구 질문

RQ1사전 학습을 통한 언어 표현이 명시적 언어 특징 없이도 관계 추출 성능을 향상시키는가?
RQ2TRE가 TACRED와 SemEval 2010 Task 8에서 최첨단 모델과 어떻게 비교되는가?
RQ3엔티티 마스킹이 일반화 및 샘플 효율성에 어떤 영향을 미치는가?
RQ4훈련 데이터가 제한적일 때 TRE는 베이스라인에 비해 얼마나 샘플 효율적인가?

주요 결과

TRE는 TACRED(67.4) 및 SemEval 2010 Task 8(87.1)에서 최첨단 F1를 달성한다.
사전 학습된 언어 표현은 성능을 크게 향상시키며, 특히 엔티티가 마스킹되지 않았을 때 정규화 이점이 나타난다.
엔티티 마스킹(NE+GR)은 강력한 성능을 보이며, 언어 표현이 엔티티 유형 및 역할 정보와 유사한 유용한 특징을 포착한다는 것을 시사한다.
TRE는 샘플 효율성이 뚜렷하여 TACRED 학습 데이터의 20%만으로도 높은 F1에 도달한다.
마스킹되지 않은 엔티티는 오버피팅을 유발할 수 있으며, 마스킹 전략은 보지 못한 엔티티에 대한 일반화를 돕는다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.