QUICK REVIEW

[논문 리뷰] Automated Hate Speech Detection and the Problem of Offensive Language

Thomas Davidson, Dana Warmsley|arXiv (Cornell University)|2017. 03. 11.

Hate Speech and Cyberbullying Detection참고 문헌 14인용 수 275

한 줄 요약

이 연구는 다중 클래스 분류기를 학습시켜 혐오 발언, 모욕적 언어, 그리고 어느 쪽도 아닌 것을 구별하기 위해 crowd-labeled tweet 데이터 세트를 사용하고, 혐오 발언과 일반적인 모욕성의 구분 및 맥락의 역할에 도전 과제를 강조합니다.

ABSTRACT

A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.

연구 동기 및 목표

혐오 발언과 모욕적 언어를 정의하고 구분의 필요성을 동기화합니다.
혐오 발언, 모욕적 언어, 그리고 어느 쪽도 아닌 것을 구분하는 라벨링된 데이터세트를 만듭니다.
분류기 성능을 평가하고 오류를 분석하여 분리 가능성을 이해합니다.
탐지 정확도에 영향을 주는 언어적 및 맥락적 요소를 식별합니다.

제안 방법

Hatebase.org에서 혐오 발언 어휘를 구축하고 어휘 용어를 포함하는 트윗을 샘플링합니다.
세 가지 클래스로 크라우드소스 라벨을 지정합니다: hate speech, offensive language, or neither.
TF-IDF unigram/bigram/trigram 특징을 추출하고, POS 태그, 감정, 가독성, 사회적 특징들을 포함합니다.
5-폴드 교차 검증으로 분류기를 학습시키고, 로지스틱 회귀, Naive Bayes, 의사 결정 트리, 랜덤 포레스트, 및 선형 SVM을 비교합니다.
one-versus-rest 프레임워크를 로지스틱 회귀(L2)를 최종 모델로 사용하고, 보류 데이터에서 평가합니다.]

실험 결과

연구 질문

RQ1다중 클래스 모델이 혐오 발언을 모욕적 언어 및 중립 콘텐츠와 신뢰할 수 있게 구분할 수 있는가?
RQ2혜오 발언과 모욕적 언어를 가장 잘 구분하는 언어적 또는 맥락적 특징은 무엇인가?
RQ3모델 예측이 인간 라벨과 어떻게 일치하며, 오류는 어디에 집중되는가?
RQ4명시적 혐오 용어의 존재가 오분류를 야기하는가, 그리고 맥락이 이를 완화할 수 있는가?
RQ5어떤 유형의 혐오 발언(예: 인종차별 대 우 vs 여성혐오)이 더 잘 감지되거나 덜 감지되는가?

주요 결과

The best model achieves overall precision 0.91, recall 0.90, and F1 0.90.
About 40% of true hate speech tweets are misclassified, with hate speech precision 0.44 and recall 0.61.
Hate speech containing strong slurs is easier to detect than hate speech without explicit terms.
Offensive language is often misclassified as hate speech when context is ignored, and sexism terms tend to be classified as offensive rather than hate.
Only 5% of offensive and 2% of innocuous tweets are labeled as hate by the model, indicating some separation between categories.
The lexicon-based approach has low precision for hate speech, underscoring the value of contextual and multi-class labeling.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.