QUICK REVIEW

[论文解读] The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

Jordan Taylor, William Agnew|arXiv (Cornell University)|Jan 14, 2026

Aesthetic Perception and Analysis被引用 0

一句话总结

这篇论文对LAION-Aesthetics Predictor (LAP) 在三个数据集上的审计揭示了偏向西方、男性凝视内容的美学过滤，并记录了 LAP 的创建过程，追踪其训练数据与开发实践。

ABSTRACT

Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION-Aesthetics Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of "aesthetics" toward more pluralistic evaluation.

研究动机与目标

研究 LAP 指标为“高质量”图片的含义及其对数据筛选的影响。
检查基于 LAP 的筛选在 LAION-Aesthetics 数据集及 MET 与 WikiArt 的艺术数据集上的偏见。
追溯 LAP 的起源，理解其训练数据与开发实践如何影响输出。
讨论因规定性美学度量带来的表征性伤害并倡导多元化评估。

提出的方法

对三个数据集进行审计：LAION-Aesthetics 数据集 (LAD)、大都会艺术博物馆 MET 图像，以及 WikiArt 艺术品数据集。
通过比较 LAP 分数高于和低于 6.5 阈值来分析内容与领域偏见。
采用逐词内容分析（PMI）检查提及身份类别与文化的字幕。
利用公开材料与开发者披露进行 LAP 生成过程的追溯民族志研究。
检查 AVA、SAC 与 LAION-Logos 数据集的训练数据组成与文档，以理解偏见来源。

Figure 1. Top 25 domains of images in the LAION-Aesthetics Dataset rated 6.5+ by the LAION-Aesthetic Prediction model

实验结果

研究问题

RQ1LAP 在不同数据集与文化语境中认为什么是“高质量”图像？
RQ2LAP 的筛选如何影响在经过筛选的数据集中性别、宗教、种族与文化的呈现？
RQ3LAP 的审美判断起源于哪里，其训练数据与开发过程存在哪些偏见？
RQ4如何调整美学评估实践以减少表征性伤害并实现更具多元性的评估？

主要发现

LAP 在 LAD 中不成比例地包含提及女性的字幕的图像，并排除了提及男性或 LGBTQ+ 群体的图像。
在 MET 与 WikiArt 中，LAP 对西方和日本艺术家创作的风景、城市景观与肖像给出最高评分，表明存在西方/日本现实主义偏见。
LAP 的训练数据主要来自英语国家摄影师和西方 AI 爱好者，AVA、SAC 与 LAION-Logos 的创作者贡献不均且文档往往有限。
LAD 6.5+ 的领域以独立视觉艺术家和摄影师为主，显示倾向于商业共享或艺术家创作内容。
LAP 的体系结构是在 CLIP 嵌入上的简单多层感知机，训练于三个具有主观评分的数据集，反映了创建者的主观、个性化品味而非普遍美学。
用于训练 LAP 的标注存在不一致性和有限的同意，令人质疑数据来源与美学评估伦理。

Figure 2. Pointwise Mutual Information (PMI) between regex and images being included in the 6.5+ subset of the LAION-Aesthetics Dataset. Terms with higher PMI (e.g., wom[ae]n) have higher likelihood of being included.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。