QUICK REVIEW

[논문 리뷰] Dialect prejudice predicts AI decisions about people's character, employability, and criminality

Valentin Hofmann, Pratyusha Kalluri|arXiv (Cornell University)|2024. 03. 01.

Computational and Text Analysis Methods인용 수 30

한 줄 요약

논문은 Matched Guise Probing을 개발하여 다수의 언어 모델에서 아프리카계 미국인 영어에 대한 은밀한 방언 편견을 드러내고, 이 편견이 인종이 명시적으로 언급되지 않은 상황에서도 고용 및 범죄 판단에 영향을 준다는 것을 보여준다.

ABSTRACT

Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.

연구 동기 및 목표

언어 모델이 명시적 인종이 아닌 방언 특성에 의해 활성화되는 은밀한 인종 언어학적 고정관념을 보유하는지 조사한다.
매칭된 어조 탐색(Matched Guise Probing) 방법을 개발하고 적용하여 모델과 설정 전반에서 방언 편견을 탐지한다.
방언 편 prejudices가 고용 및 형사 사법 맥락에서 AI의 의사결정에 어떤 영향을 미치는지 평가한다.
일반적인 편향 완화 전략(스케일링, 인간 피드백)이 은밀한 방언 편견을 감소시키는지 평가한다.]

제안 방법

은 overt 한 인종 언급 없이 AAE와 SAE 텍스트의 예측을 비교하기 위해 Matched Guise Probing을 도입한다.
의미-일치 및 비의미-일치 프롬프트에 걸쳐 여러 모델(GPT2, RoBERTa, T5, GPT3.5, GPT4)을 분석한다.
Princeton Trilogy 연구의 인간 고정관념과 비교하여 AAE와 연관된 형용사를 순위 매겨 은밀한 고정관념을 측정한다.
AAE 화자와 SAE 화자에 직업을 매칭하고 명성과의 상관관계를 검토하여 고용 가능성을 평가한다.
AAE 발화와 SAE 발화에 대한 유죄 선고 및 사형 선고율을 계산하여 형사 범죄 판단을 시뮬레이션한다.
overt 대 covert 고정관념에 미치는 스케일링과 인간 피드백의 효과를 검토한다.

실험 결과

연구 질문

RQ1언어 모델이 명시적 인종 신호와 무관하게 AAE 특징에 의해 촉발되는 은밀한 방언 편 prejudices를 보이는가?
RQ2은밀한 고정관념이 언어 모델에서 명시적 고정관념과 어떻게 비교되며 과거 인간 고정관념과 어떤 정합성을 보이는가?
RQ3방언 기반 편향이 고용 및 형사 사법 시나리오에서 AI의 판단에 영향을 미치는가?
RQ4모델 스케일링이나 인간 피드백 훈련이 은밀한 방언 편 prejudices를 완화할 수 있는가?

주요 결과

언어 모델의 은밀한 AAE 고정관념은 1930년대의 구식 인간 고정관념과 일치하며 어떤 실험적으로 기록된 현대 인간 고정관념보다 더 부정적이다.
여러 모델에서 아프리카계 미국인에 대한 노골적 고정관념은 긍정적인 경우가 많으며, 특히 인간 피드백으로 학습된 모델에서 그러한 경향이 나타나 은밀한 편견과 명시적 편견 사이의 불일치를 만든다.
고용 작업에서 모델은 AAE 음성을 더 낮은 명성과 연관시키고 SAE에 더 높은 연관성을 보이며, AAE 화자에 대해 직업적 명성이 낮아질 것으로 예측한다.
범죄성 작업에서 모델은 AAE 발화에 대해 SAE 발화보다 더 높은 유죄 확률과 사형 선고 비율을 보인다.
모델 스케일링은 은밀한 방언 편 prejudices를 증가시키고(overall 이해도 향상에도 불구하고) 명시적 편견을 감소시키지 않는다; 인간 피드백 훈련은 명시적 긍정성을 높이지만 은밀한 편 prejudices를 크게 감소시키지 않는다.
인간 피드백은 명시적 고정관념을 줄이지만 은밀한 고정관념은 대부분 유지되어 일부 모델에서 은밀-명시 간 격차를 확대한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.