[论文解读] Stable Bias: Analyzing Societal Representations in Diffusion Models
本文提出一种方法,通过改变性别/族裔提示并与职业提示进行比较,来审计文本到图像扩散系统中的社会偏见,应用于 Stable Diffusion 和 DALL·E 2,并释放开放工具和数据集。
As machine learning-enabled Text-to-Image (TTI) systems are becoming increasingly prevalent and seeing growing adoption as commercial services, characterizing the social biases they exhibit is a necessary first step to lowering their risk of discriminatory outcomes. This evaluation, however, is made more difficult by the synthetic nature of these systems' outputs: common definitions of diversity are grounded in social categories of people living in the world, whereas the artificial depictions of fictive humans created by these systems have no inherent gender or ethnicity. To address this need, we propose a new method for exploring the social biases in TTI systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts, and comparing it to the variation engendered by spanning different professions. This allows us to (1) identify specific bias trends, (2) provide targeted scores to directly compare models in terms of diversity and representation, and (3) jointly model interdependent social variables to support a multidimensional analysis. We leverage this method to analyze images generated by 3 popular TTI systems (Dall-E 2, Stable Diffusion v 1.4 and 2) and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents. We also release the datasets and low-code interactive bias exploration platforms developed for this work, as well as the necessary tools to similarly evaluate additional TTI systems.
研究动机与目标
- 定义用于生成图像的性别和族裔的灵活代理标记,以研究社会变异。
- 开发基于提示的提示来审计 TTI 系统中跨职业的表示。
- 提供定量和定性分析,揭示输出中边缘化身份的代表性不足。
- 提供数据集和低代码交互平台,便于更广泛评估 TTI 系统。
提出的方法
- 使用结合族裔、性别和职业(来自美国劳工统计局 U.S. BLS)的 146 个职业的模式来生成提示。
- 使用两种分析模态:文本为基础(图像标题和 VQA 词汇)和基于图像(图像嵌入聚类),以评估偏见。
- 将图像嵌入聚类成 24 个区域,以捕捉与提示中的身份短语相关的变异。
- 通过将聚类-区域分布与美国劳工统计局的人口统计数据联系起来(五分位比较)来汇总结果。
- 提供交互式工具(Diffusion Bias Explorer、Average Face Comparison Tool、k-NN Explorer),用于定性探索。
实验结果
研究问题
- RQ1当以职业相关提示提示时,扩散基底的 TTI 系统在描绘性别和族裔方面有何差异?
- RQ2TTI 输出是否在职业领域再现、再现或加剧现实世界的人口统计分布?
- RQ3对图像嵌入的聚类是否能揭示超越简单标签分配的多维偏见?
- RQ4哪些互动工具可以促进对扩散模型的定性和可扩展审计?
主要发现
- 三个系统都显示与美国劳动力人口统计的相关性,但在不同程度上持续存在对边缘化身份的代表性不足。
- 标题(Caption)和 VQA 输出在大多数提示中揭示性别标记,但 VQA 的性别相关性不如标题明显(约 97.66% vs 45.56% 使用性别术语)。
- 身份区域聚类识别与特定性别/族裔提示相关的区域;一些区域(如区域 4)主要反映白人男性,而其他区域显示更丰富的关联。
- 跨系统来看,女性和黑人在更具多样性的职业中代表性不足更为明显,DALL·E 2 在分位分析中显示出最强的偏见。
- 该框架可推广至额外的 TTI 系统,表明偏见的变异性与微调相关,而不仅仅是共用的预训练。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。