QUICK REVIEW

[论文解读] Bias of AI-Generated Content: An Examination of News Produced by Large Language Models

Fang Xiao, Shangkun Che|arXiv (Cornell University)|Sep 18, 2023

Computational and Text Analysis Methods被引用 16

一句话总结

本文通过比较 AIGC 与 NYT/Reuters 的文章，在词、句子和文档层面评估 AI 生成新闻中的性别与种族偏见，并在带偏见提示和 RLHF 效应下进行分析。

ABSTRACT

Large language models (LLMs) have the potential to transform our lives and work through the content they generate, known as AI-Generated Content (AIGC). To harness this transformation, we need to understand the limitations of LLMs. Here, we investigate the bias of AIGC produced by seven representative LLMs, including ChatGPT and LLaMA. We collect news articles from The New York Times and Reuters, both known for their dedication to provide unbiased news. We then apply each examined LLM to generate news content with headlines of these news articles as prompts, and evaluate the gender and racial biases of the AIGC produced by the LLM by comparing the AIGC and the original news articles. We further analyze the gender bias of each LLM under biased prompts by adding gender-biased messages to prompts constructed from these news headlines. Our study reveals that the AIGC produced by each examined LLM demonstrates substantial gender and racial biases. Moreover, the AIGC generated by each LLM exhibits notable discrimination against females and individuals of the Black race. Among the LLMs, the AIGC generated by ChatGPT demonstrates the lowest level of bias, and ChatGPT is the sole model capable of declining content generation when provided with biased prompts.

研究动机与目标

以来自纽约时报和路透社的高质量新闻文章作为参考内容来代理无偏见内容。
使用新闻标题作为提示生成 AIGC，并将词级、句级和文档级偏见与参考内容进行比较。
在带偏见的提示下分析偏见，并评估模型对偏见提示的抵抗能力。
评估模型规模和 RLHF 如何影响性别与种族群体的偏见。

提出的方法

收集自 2022 年 12 月至 2023 年 4 月的 8,629 篇 NYT 和 Reuters 新闻文章作为参考内容。
使用文章标题作为提示，将每个 LLM 应用于生成新闻内容。
通过在 AIGC 与参考内容的群体词分布之间计算 Wasserstein 距离来衡量词级偏见。
通过对涉及性别/种族的句子进行情感和毒性分析来评估句级偏见。
通过性别/种族相关内容的语义和主题分析来评估文档级偏见。
通过在提示中注入性别偏见信息来检验带偏见提示下的偏见，并评估模型对偏见提示的抵抗能力。

实验结果

研究问题

RQ1来自代表性大型语言模型的 AIGC 在性别和种族相关的词语使用上与高质量参考新闻有何差异？
RQ2在性别和种族方面，AIGC 在句子级和文档级偏见有哪些，包括情感与毒性？
RQ3AIGC 对偏见提示的响应如何，模型在多大程度上抵抗或传播此类偏见？
RQ4模型规模或 RLHF（如在 ChatGPT 中）是否能在词、句子和文档层面缓解偏见？

主要发现

大型语言模型	白人	均值	95% 置信区间	样本量
Grover	20.07%	[18.79%, 21.35%]	5410	<0.001
GPT-2	3.62%	[2.08%, 5.16%]	4203	<0.001
GPT-3-curie	4.67%	[3.44%, 5.91%]	3848	<0.001
GPT-3-davinci	2.47%	[1.31%, 3.63%]	3854	<0.001
ChatGPT	2.35%	[1.21%, 3.49%]	3738	<0.001
Cohere	2.60%	[1.51%, 3.70%]	4793	<0.001
LLaMA-7B	2.65%	[1.1%, 4.20%]	2764	<0.001

所评估的所有 LLM 在词、句子和文档层面生成的 AIGC 相对于 NYT/Reuters 参考在性别和种族偏见方面都具有显著偏见。
ChatGPT 通常在测试模型中偏见最低，这得益于来自人类反馈的强化学习（RLHF）。
RLHF 有助于降低词级和文档级偏见，并使 ChatGPT 在偏见提示下能够拒绝内容，尽管在未被过滤的情况下，偏见提示仍可能产生高度偏见的输出。
在词级层面，黑人人群偏见在各模型中尤为突出，与参考相比，AIGC 中黑人民族相关词汇使用显著减少。
偏见往往随模型规模的增大在 GPT 家族模型中减小，RLHF 进一步帮助在各指标上减少偏见。
文档级分析显示各模型均存在显著的性别偏见和种族偏见，ChatGPT 常表现最好，但对带偏见提示并非对抗性完全屏障。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。