QUICK REVIEW

[论文解读] Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

Su Lin Blodgett, Brendan O’Connor|arXiv (Cornell University)|Jun 30, 2017

Hate Speech and Cyberbullying Detection被引用 60

一句话总结

该论文实证分析了非裔美籍英语推文在语言识别中的种族差异，显示在多种现成工具以及不同消息长度下仍存在准确率差距。

ABSTRACT

We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identification for tweets written in African-American English, and discuss implications of disparity in NLP.

研究动机与目标

通过考察方言和种族如何影响语言处理性能来推动NLP的公平性
量化非裔美籍英语与白人取向推文在语言识别准确率上的差异
在控制消息长度并在多种商业和开源工具中评估差异是否仍然存在
讨论这些差异对下游NLP任务和潜在的公平性强化方法的影响

提出的方法

使用带混合成员人口统计标签的大规模AA-ETwitter语料库来识别AA对齐与White对齐的消息
在按长度分箱的20,000条推文上评估四种语言识别器(langid.py, IBM Watson, Microsoft Azure, Twitter metadata)
在每个长度箱内计算AA对齐与White对齐消息之间的准确率差距
将分析规模从200条扩展到20,000条推文以测试差异的鲁棒性

实验结果

研究问题

RQ1语言识别工具在AA对齐推文与White对齐推文之间是否表现出不同的准确率？
RQ2消息长度如何影响跨方言语言识别的准确性和差异？
RQ3差异在开源和商业语言识别器之间是否一致？
RQ4这些差异对下游NLP任务与公平性有何影响？

主要发现

所有分类器在较长消息上准确率较高，短消息(<10 tokens)的差距最大
开源的langid.py在差异方面尤为显著，尤其在短消息上差距可达19.7个百分点
IBM Watson在最短长度箱中的差异最大，为15.1个百分点
Microsoft Azure通常差异较小，在较长消息箱中的差距为0.3–6.6个百分点
Twitter自有识别器在最短箱中显示最高差异(19.7个百分点)，在最长箱中则为负差异(-3.0点)
总体上，差异在从200扩展到20,000条推文以及跨不同工具时仍然存在

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。