QUICK REVIEW

[논문 리뷰] Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles

Amirmohammad Kazameini, Samin Fatehi|arXiv (Cornell University)|2020. 10. 03.

Sentiment Analysis and Opinion Mining참고 문헌 16인용 수 52

한 줄 요약

BB-SVM은 계산적으로 효율적인 접근 방식으로, BERT 기반 컨텍스트 임베딩과 Mairesse 특징 및 Bagging-SVM을 결합하여 에세이에서 Big-Five 성격 특성을 예측하고, 이전 최첨단보다 1.04% 포인트 향상시킵니다.

ABSTRACT

Recently, the automatic prediction of personality traits has received increasing attention and has emerged as a hot topic within the field of affective computing. In this work, we present a novel deep learning-based approach for automated personality detection from text. We leverage state of the art advances in natural language understanding, namely the BERT language model to extract contextualized word embeddings from textual data for automated author personality detection. Our primary goal is to develop a computationally efficient, high-performance personality prediction model which can be easily used by a large number of people without access to huge computation resources. Our extensive experiments with this ideology in mind, led us to develop a novel model which feeds contextualized embeddings along with psycholinguistic features toa Bagged-SVM classifier for personality trait prediction. Our model outperforms the previous state of the art by 1.04% and, at the same time is significantly more computationally efficient to train. We report our results on the famous gold standard Essays dataset for personality detection.

연구 동기 및 목표

텍스트로부터 자동으로 성격을 탐지하기 위한 계산적으로 효율적인 모델을 개발한다.
BERT 컨텍스트 임베딩을 심리언어학적 특징과 결합하여 활용한다.
학습 시간을 줄이면서 Essays 성격 데이터셋에서 예측 성능을 향상시킨다.

제안 방법

에세이를 BERT 입력 한도에 맞추기 위해 200-token 서브 문서로 분할한다.
BERT 계층 전반의 토큰 표현을 평균화하고 마지막 네 계층을 연결하여 컨텍스트화된 임베딩을 추출한다.
BERT 특징과 84개의 Mairesse 특징을 연결하여 3156-차원 문서 특징 벡터를 형성한다.
병렬로 10개의 SVM 분류기를 학습시키고(Bagging) 다수결 투표를 최종 예측에 사용한다.
Bagging-SVM을 단일 모델 및 다양한 특징 구성과 비교한다.

실험 결과

연구 질문

RQ1BERT 기반 컨텍스트 임베딩과 심리언어학적 특징이 텍스트로부터 성격 특성 예측을 기존 방법보다 개선할 수 있는가?
RQ2다중 SVM 분류기를 Bagging하는 것이 이 작업에서 성능과 학습 시간상의 이점을 제공하는가?
RQ3마지막 네 계층의 BERT 표현을 사용하는 것이 다른 구성과 비교하여 정확도에 어떤 영향을 미치는가?
RQ4BB-SVM은 Essays 데이터셋에서 이전 최첨단과 어떻게 비교되는가?

주요 결과

모델 Id	단어 임베딩	문장 특징 추출	문서 특징 추출	분류기	평균 정확도
M8	W2V	-	Mean	Bagging-SVM	57.38
BB-SVM	BERT (4 last layers)	-	Mean	Bagging-SVM	59.03

BB-SVM은 평균 정확도(59.03%)에서 이전 최첨단(57.99%)보다 높게 달성한다.
연구에 사용된 구성 하에서 Word2Vec 기반 접근법보다 BERT(마지막 네 계층)와 Bagging-SVM의 사용이 우수하다.
Bagging은 단일 SVM에 비해 성격 탐지의 분류 정확도를 향상시킨다.
학습 시간은 이전 방법의 약 50시간에 비해 약 7분으로 크게 단축된다.
마지막 네 계층의 BERT 레이어를 Mairesse 특징과 연결하는 것이 SVM 분류기에 강력한 특징 벡터를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.