QUICK REVIEW

[논문 리뷰] Quantifying Mental Health from Social Media with Neural User Embeddings

Silvio Amir, Glen Coppersmith|arXiv (Cornell University)|2017. 04. 30.

Mental Health via Writing참고 문헌 27인용 수 24

한 줄 요약

이 논문은 트위터 게시물 기록에서 정신 건강 관련 표현을 학습하는 신경망 사용자 임베딩 모델을 제안하며, 이러한 임베딩이 동질성 패턴을 포착하고 정신 건강 상태 예측 성능을 향상시킴을 입증한다. 이 방법은 부분공간 학습을 통해 임베딩를 적응시켜, 최소한의 레이블 데이터로도 정신 건강 장애를 더 잘 구분함으로써 베이스라인 모델을 능가한다.

ABSTRACT

Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control' users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.

연구 동기 및 목표

소셜 미디어 게시물에서 유도된 사용자 임베딩이 정신 건강 상태와 관련된 정보를 포함하고 있는지 조사하기 위해.
유사한 정신 건강 상태를 가진 사용자 간의 동질성 관계를 유저 임베딩이 얼마나 잘 포착하는지 평가하기 위해.
기존 텍스트 기반 특징과 비교하여 이러한 임베딩이 정신 건강 예측 모델의 성능 향상에 기여하는지 평가하기 위해.
작은 양의 작업에 특화된 레이블 데이터를 사용하여 일반 목적의 유저 임베딩을 어떻게 적응시킬 수 있는지 효과를 탐색하기 위해.

제안 방법

사용자 이력 트위터 게시물에 기반해 스위프-그램 모델(User2Vec)과 문단 벡터 변형(PV-dbow, PV-dm)을 사용하여 유저 임베딩을 학습한다.
대규모 코퍼스에서 사전 학습된 스위프-그램 모델을 사용해 단어 임베딩을 초기화하여 표현 품질을 향상시킨다.
일반 사용자 임베딩을 정신 건강 예측 작업에 적응시키기 위해 새로운 신경선형 부분공간 임베딩(NLSE) 방법을 제안한다. 이 방법은 임베딩을 작업에 특화된 부분공간으로 투영함으로써 수행된다.
NLSE 모델은 레이블된 정신 건강 상태를 바탕으로 학습된 투영 행렬을 사용한 선형 변환을 통해 임베딩을 정밀하게 조정한다.
베이스라인 모델로는 보그 오브 워즈(BOW), TF-IDF, 그리고 유저 임베딩과 텍스트 특징을 결합한 하이브리드 모델(u2v+bow, u2v+boe)이 포함된다.
모델 학습은 10겹 교차검증을 사용하며, 조기 정지와 정규화 및 하이퍼파라미터에 대한 그리드 서치를 수행한다.

실험 결과

연구 질문

RQ1소셜 미디어 게시물 이력에서 학습된 유저 임베딩이 정신 건강 상태에 따라 동질성 관계를 얼마나 잘 포착하는가?
RQ2유저 임베딩이 정신 건강 장애가 있는 사용자와 인구통계학적으로 유사한 대조군을 효과적으로 분류하는 데 유용한 특징이 될 수 있는가?
RQ3작업에 특화된 유저 임베딩 적응이 일반 목적의 임베딩과 비교해 정신 건강 예측 성능을 얼마나 향상시키는가?
RQ4신경망 유저 임베딩이 BOW와 같은 전통적인 텍스트 기반 베이스라인을 능가하는가?

주요 결과

BOW 베이스라인이 대부분의 다른 모델보다 뛰어난 성능을 보였으며, 이는 소셜 미디어 데이터에서 정신 건강 장애를 직접 언급하는 것이 강력한 예측 변수임을 시사한다.
User2Vec과 PV-dm는 유사한 성능을 보였고, PV-dbow는 유의미하게 열등한 성능을 보였다. 이는 게시물의 모든 단어를 예측하는 것이 더 나은 표현을 만들어내는 데 기여함을 시사한다.
일반 사용자 임베딩을 부분공간 투영을 통해 작업에 특화된 방식으로 적응시키는 NLSE 모델이 모든 베이스라인을 능가했으며, 특히 소수 집단(우울증 및 PTSD)의 탐지에서 뚜렷한 성능 향상을 보였다.
NLSE가 베이스라인 대비 이진 F1 스코어에서 가장 두드러진 향상을 보인 것은, 임상적으로 중요한 사례를 더 잘 분류할 수 있음을 시사한다.
적응된 임베딩는 t-SNE 시각화에서 제어군과 정신 건강 장애가 있는 사용자 간의 분리 능력이 향상되었음을 보여주며, 더 나은 군집화를 이룩했다.
결과는 비지도 학습을 통해 유도된 사용자 임베딩조차도 잠재된 정신 건강 관련 신호를 포착할 수 있으며, 최소한의 레이블 데이터로의 피니어 튜닝이 후속 작업 성능을 크게 향상시킬 수 있음을 확인한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.