QUICK REVIEW

[논문 리뷰] Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Sadeep Jayasumana, Srikumar Ramalingam|arXiv (Cornell University)|2023. 11. 30.

Explainable Artificial Intelligence (XAI)인용 수 8

한 줄 요약

논문은 이미지 생성을 위한 FID를 비판하고, CMMD를 도입한다. CMMD는 CLIP 기반의 MMD 거리로 인간 판단과의 일치를 보장하는 더 신뢰할 수 있고 샘플 효율적인 평가를 제공한다.

ABSTRACT

As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm. We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity. We call for a reevaluation of FID's use as the primary quality metric for generated images. We empirically demonstrate that FID contradicts human raters, it does not reflect gradual improvement of iterative text-to-image models, it does not capture distortion levels, and that it produces inconsistent results when varying the sample size. We also propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel. It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient. Through extensive experiments and analysis, we demonstrate that FID-based evaluations of text-to-image models may be unreliable, and that CMMD offers a more robust and reliable assessment of image quality.

연구 동기 및 목표

현대 이미지 생성 및 텍스트-이미지 모델의 주요 지표로서 FID의 신뢰성을 의심한다.
CLIP 임베딩과 MMD를 기반으로 한 분포에 독립적이고 편향되지 않으며 샘플 효율적인 대안으로 CMMD를 제안한다.
CMMD가 인간 판단과의 정합성 및 왜곡에 대한 강건성과 점진적 개선에 대한 강건함을 입증한다.

제안 방법

FID의 Fréchet 거리 가정에 대해 비판적으로 분석하고 정규성 및 샘플 크기 문제를 강조한다.
현대 이미지에서 풍부한 내용을 포착하기 위해 CLIP 임베딩을 채택한다.
실제 이미지 집합과 생성된 이미지 집합의 CLIP 임베딩 간의 제곱 MMD 거리를 Gaussian RBF 커널로 정의한다.
커널 k(x,y)=exp(-||x-y||^2/(2*sigma^2))이고 고정된 sigma=10인 MMD의 편향 없는 추정치를 사용하고 읽기 쉽게 결과를 1000배로 스케일한다.
CMMD의 참조 구현을 제공한다.
왜곡, 점진적 생성 및 샘플 크기 설정 전반에 걸쳐 CMMD와 FID를 비교하고 인간 평가를 포함한다.

실험 결과

연구 질문

RQ1FID가 현대 텍스트-이미지 모델 및 점진적 개선을 통해 이미지 품질을 신뢰할 수 있게 반영하는가?
RQ2CLIP 기반 MMD 지표가 FID의 분포에 독립적이고 편향되지 않으며 샘플 효율적인 대안을 제공할 수 있는가?
RQ3다양한 왜곡과 반복적 생성 프로세스 하에서 CMMD 점수가 인간 판단과 일치하는가?

주요 결과

FID는 인간 평가자와 모순될 수 있으며 반복적 생성 모델에서 점진적 개선을 추적하지 못한다.
CLIP 기반 임베딩은 Inception 특징보다 더 풍부한 내용을 포착하여 CMMD를 견고한 지표로 뒷받침한다.
CMMD는 왜곡 수준과 반복 개선을 단조롭게 반영하여 인간 판단과 정합한다.
CMMD는 FID보다 샘플 효율적이며 계산 속도가 더 빨라 실시간 온라인 평가를 가능케 한다.
실험은 FID가 그렇지 않은 곳에서 CMMD가 인간의 선호도와 일치함을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.