[论文解读] Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media
本文根据目标对仇恨言论进行区分(Directed vs Generalized),构建数据集,并分析语言、心理语言学和语义模式,以提升对仇恨言论的理解和检测。
While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech -- its target: either "directed" towards a specific person or entity, or "generalized" towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection.
研究动机与目标
- Identify and characterize hate speech by its target: Directed (toward an individual) vs Generalized (toward a protected group).
- Construct high-quality datasets of Directed and Generalized hate tweets for robust analysis.
- Uncover lexical, semantic, and psycholinguistic patterns that differentiate the two hate speech forms.
- Assess implications for detection, policy, and social impact of hate speech.
- Provide data-driven insights to improve hate speech detection while informing discourse policy.
提出的方法
- Construct multiple hate speech datasets from 1% Twitter streaming data (Key phrase-based and Hashtag-based), plus public datasets and NHSM data.
- Apply Perspective API toxicity and attack_on_commenter models to filter high-quality hate speech candidates and ensure directedness (mentions and second-person pronouns).
- Use human annotation (Crowdflower) with multiple annotators to label Directed vs Generalized hate speech and compute inter-annotator reliability.
- Perform lexical analysis with SAGE to extract salient words per category and T-NER for named entity recognition, focusing on domain-specific entities.
- Conduct psycholinguistic analysis with LIWC2015 to measure dimensions like analytical thinking, clout, authenticity, and emotion.
- Utilize SemaFor to annotate frame semantics and compare frames across Directed, Generalized, and Gen-1% tweets.
- Analyze entity distributions, semantic frames, and LIWC metrics to contrast Directed vs Generalized hate speech.
实验结果
研究问题
- RQ1What linguistic and psycholinguistic markers distinguish Directed hate speech from Generalized hate speech?
- RQ2How do lexical, semantic-frame, and entity patterns differ between Directed and Generalized hate speech in social media?
- RQ3What implications do the target-based distinctions have for hate speech detection and policy?
主要发现
- Directed hate speech is more personal, informal, angrier, and shows higher clout than Generalized hate speech.
- Generalized hate speech is dominated by religion-related terms and includes lethal language (kill, murder, exterminate) and quantity words (million, many).
- Named entity analysis shows Directed hate contains more person entities, while Generalized hate features more religious and group-related entities.
- LIWC analysis reveals Directed hate has lower analytical thinking and higher informal and social language, with greater anger; Generalized hate shows higher authenticity and emotion, and more religion-focused content.
- Frame-semantic analysis shows Directed hate emphasizes intentional acts and hindering, while Generalized hate emphasizes Killing, Religion, and Quantity frames.
- Salient words learned via SAGE indicate minimal overlap between categories, with distinct topic domains for each hate speech type.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。