[논문 리뷰] Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
본 논문은 Stable Diffusion 출력에 대한 선호도의 대규모 인간 주석 데이터셋을 수집하고, 전통적 지표가 인간의 선택을 잘 예측하지 못함을 보이며, 인간 선호도 분류기를 학습해 인간 선호 점수(HPS)를 도출하고, HPS를 사용해 Stable Diffusion을 사람에게 더 선호될 이미지로 생성하도록 적응한다.
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/align_sd_web/ .
연구 동기 및 목표
- Create a large-scale dataset of human choices on images generated with a common prompt.
- Evaluate how existing metrics (IS, FID, CLIP) correlate with human preferences.
- Develop a human preference classifier fine-tuned on the dataset to derive HPS.
- Demonstrate how HPS can guide adaptation of Stable Diffusion to produce more human-preferred images.
제안 방법
- Train a human preference classifier by fine-tuning CLIP ViT-L/14 on prompts with multiple generated images and one preferred image.
- Define Human Preference Score (HPS) as 100 times the cosine similarity between the visual and text embeddings from the human preference classifier.
- Construct a training dataset by labeling prompts with preferred and non-preferred images using the classifier.
- Adapt Stable Diffusion via LoRA by creating a training set that pairs prompts with preferred/non-preferred images and prefixes non-preferred captions to guide the model away from artifacts.
- During inference, use the special identifier as a negative prompt in classifier-free guidance to avoid non-preferred outputs.

실험 결과
연구 질문
- RQ1How well do standard metrics (IS, FID, CLIP) correlate with human preferences on generated images?
- RQ2Can a human preference classifier trained on a large annotated dataset predict human choices more accurately than CLIP alone?
- RQ3Can we use a human-preference-guided objective to adapt Stable Diffusion to produce more preferred images?
- RQ4Does HPS generalize across different text-to-image models beyond Stable Diffusion?
주요 결과
| 지표 | SD 1.4 | 적응된 모델 |
|---|---|---|
| FID | 19.72 | 19.35 |
| Aesthetic Score | 5.90 | 6.06 |
| CLIP Score | 0.2816 | 0.2831 |
| HPS | 0.1898 | 0.1916 |
- IS, FID, and CLIP scores do not fully match human choices on the dataset.
- A fine-tuned CLIP-based human preference classifier yields higher agreement with human choices than CLIP alone.
- HPS correlates with human preferences and can generalize to images from other models (e.g., DALL·E, Stable Diffusion).
- Adapting Stable Diffusion with LoRA guided by HPS yields images with fewer artifacts and higher human preference in user studies.
- Quantitative comparison shows the adapted SD v1.4 model has lower FID and higher Aesthetic Score and CLIP/HPS metrics compared to the baseline (SD v1.4).

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.