QUICK REVIEW

[논문 리뷰] Interpretable Multi-Modal Hate Speech Detection

Prashanth Vijayaraghavan, Hugo Larochelle|arXiv (Cornell University)|2021. 03. 02.

Hate Speech and Cyberbullying Detection참고 문헌 28인용 수 24

한 줄 요약

이 논문은 텍스트 의미론과 사회문화적 맥락 및 사회 그래프 특징을 결합하여 혐오표현을 탐지하는 심층 다중모달 모델을 제안하고, 텍스트 만 기반의 baselines보다 해석가능한 인사이트와 함께 우수한 성능을 보임.

ABSTRACT

With growing role of social media in shaping public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques primarily fail to look beyond the textual content. Moreover, few attempts have been made to focus on the aspects of interpretability of such models given the social and legal implications of incorrect predictions. In this work, we propose a deep neural multi-modal model that can: (a) detect hate speech by effectively capturing the semantics of the text along with socio-cultural context in which a particular hate expression is made, and (b) provide interpretable insights into decisions of our model. By performing a thorough evaluation of different modeling techniques, we demonstrate that our model is able to outperform the existing state-of-the-art hate speech classification approaches. Finally, we show the importance of social and cultural context features towards unearthing clusters associated with different categories of hate.

연구 동기 및 목표

텍스트를 넘어서 사회문화적 맥락을 이용해 혐오표현을 탐지할 필요성을 제기한다.
텍스트, 인구통계학적 특징, 사회 그래프 특징을 융합하는 다중모달 신경망 모델을 개발한다.
사회적·문화적 맥락이 혐오표현 탐지 성능을 향상시킨다는 것을 보여준다.
어텐션 메커니즘을 통해 모델 의사결정에 대한 해석 가능한 인사이트를 제공한다.
학습된 임베딩을 사용하여 혐오표현을 카테고리로 클러스터링하는 모델의 능력을 입증한다.

제안 방법

트윗과 작성자 속성을 포함하는 다중모달 혐오표현 데이터셋 D(H) 정의.
문자를 강화한 단어 표현과 자체-attention으로 텍스트를 인코딩하여 텍스트 특성을 생성한다.
작성자의 인구통계 표현을 통해 사전 학습된 인구통계 분류기를 사용하여 사회문화적 맥락을 추출한다.
혐오 커뮤니티 팔로워 그래프 G^h에서 사회적 맥락 특징을 구성하고 이를 저차원 벡터로 매핑한다.
지연 융합 self-attention 메커니즘을 사용하여 텍스트와 사회문화적 특징을 융합하고 분류를 위한 최종 표현을 생성한다.
범주형 교차 엔트로피로 모델을 학습하고 전통적 및 딥러닝 baselines와 비교 평가한다.

실험 결과

연구 질문

RQ1사회문화적 및 사회적 맥락 특징을 도입하면 텍스트-only 모델 대비 혐오표현 탐지 성능이 향상되는가?
RQ2인구통계학적 특징과 소셜 그래프 특징이 혐오표현 탐지 및 혐오 카테고리 클러스터링에 어떻게 기여하는가?
RQ3모델이 어텐션 가중치를 통해 예측에 대한 해석 가능한 통찰을 제공할 수 있는가?
RQ4다중모달 융합의 상대적 이득은 텍스트 전용 및 전통적 모델과 비교하여 얼마나 되는가?

주요 결과

제안된 다중모달 모델은 전통적 및 텍스트 전용 딥러닝 베이스라인을 F1(hate) 및 F1(overall)에서 능가한다.
Text+SC 모델(텍스트와 사회문화적 특징 결합)은 텍스트 전용 대비 더 높은 성능을 달성한다(예: BiGRU+Char+Attn+FF: F1 Hate 0.784, F1 Overall 0.90).
사회 및 문화 맥 context를 포함하면 텍스트 전용 모델 대비 성능이 크게 향상된다(예: BiGRU+Char+Attn: F1 Hate 0.744, F1 Overall 0.864).
모델은 혐오 제스처 임베딩을 학습하며 이는 상위 주의 단어에서 확인되는 정성적 증거와 함께 카테고리로 클러스터링될 수 있다(Anti-Islam, Anti-Black, Anti-Immigrant, General Hate, Anti-Semitic).
주의 기반 해석 가능성은 섞은 방법과 perturbation 기반 방법과 일치하며 예측에서 코드어 및 맥락적 단서를 강조한다.
클러스터 순도 점수는 사회문화적 맥 context 사용할 때 ground-truth 혐오 카테고리와 더 잘 맞게 나타난다(Text+SC: 0.76 vs Text Only: 0.52).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.