[論文レビュー] The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor
This paper audits the LAION-Aesthetics Predictor (LAP) across three datasets to reveal biased aesthetic filtering favoring Western, male-gazey content and documents how LAP was created, tracing its training data and development practices.
Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION-Aesthetics Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of "aesthetics" toward more pluralistic evaluation.
研究の動機と目的
- Investigate what the LAION-Aesthetics Predictor (LAP) measures as 'high-quality' images and how this affects data curation.
- Examine biases in LAP-driven filtering on the LAION-Aesthetics Dataset and on art datasets from MET and WikiArt.
- Trace the origins of LAP to understand how its training data and development practices influence its outputs.
- Discuss representational harms arising from prescriptive aesthetic measures and advocate for pluralistic evaluation.
提案手法
- Audit LAP on three datasets: LAION-Aesthetics Dataset (LAD), Metropolitan Museum of Art (MET) images, and WikiArt artwork dataset.
- Analyze content and domain biases by comparing LAP scores above and below a 6.5 threshold.
- Use word-level content analysis (PMI) to examine captions mentioning identity categories and cultures.
- Conduct a trace ethnography of LAP’s creation using public materials and developer disclosures.
- Inspect training data composition and documentation across AVA, SAC, and LAION-Logos datasets to understand bias sources.

実験結果
リサーチクエスチョン
- RQ1What does LAP consider a 'high-quality' image across different datasets and cultural contexts?
- RQ2How does LAP’s filtering affect representation of gender, religion, ethnicity, and cultures in curated datasets?
- RQ3Where did LAP’s aesthetic judgments originate, and what are the biases in its training data and development process?
- RQ4How can aesthetic evaluation practices be reframed to reduce representational harms and be more pluralistic?
主な発見
- LAP disproportionately includes images with captions mentioning women and excludes those mentioning men or LGBTQ+ communities in the LAD.
- Across MET and WikiArt, LAP rates landscapes, cityscapes, and portraits from western and Japanese artists most highly, suggesting a Western/Japanese realist bias.
- Training data for LAP are predominantly English-speaking photographers and western AI-enthusiasts, with creators of AVA, SAC, and LAION-Logos contributing unevenly and often with limited documentation.
- The domains of LAD 6.5+ are dominated by independent visual artists and photographers, indicating a tilt toward commercially shared or artist-created content.
- LAP’s architecture is a simple multi-layer perceptron on a CLIP embedding, trained on three datasets with subjective ratings, reflecting a subjective, individualized taste of its creator rather than universal aesthetics.
- There are inconsistencies and limited consent in the annotations used to train LAP, raising questions about data provenance and ethics in aesthetic evaluation.
![Figure 2. Pointwise Mutual Information (PMI) between regex and images being included in the 6.5+ subset of the LAION-Aesthetics Dataset. Terms with higher PMI (e.g., wom[ae]n) have higher likelihood of being included.](https://ar5iv.labs.arxiv.org/html/2601.09896/assets/x2.png)
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。