QUICK REVIEW

[論文レビュー] Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Sadeep Jayasumana, Srikumar Ramalingam|arXiv (Cornell University)|Nov 30, 2023

Explainable Artificial Intelligence (XAI)被引用数 8

ひとこと要約

この論文は image generation における FID を批判し、CMMD を提案する。CMMD は CLIP ベースの MMD 距離であり、人間の判断と一致する、より信頼性が高く、サンプル効率の良い評価を提供する。

ABSTRACT

As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm. We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity. We call for a reevaluation of FID's use as the primary quality metric for generated images. We empirically demonstrate that FID contradicts human raters, it does not reflect gradual improvement of iterative text-to-image models, it does not capture distortion levels, and that it produces inconsistent results when varying the sample size. We also propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel. It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient. Through extensive experiments and analysis, we demonstrate that FID-based evaluations of text-to-image models may be unreliable, and that CMMD offers a more robust and reliable assessment of image quality.

研究の動機と目的

現代の画像生成およびテキストから画像へのモデルの主要指標としての FID の信頼性を問い直す。
CLIP 埋め込みと MMD に基づく、分布に依存しない、公平でサンプル効率の良い代替手段 CMMD を提案する。
CMMD が人間の判断と一致すること、および歪みや漸進的改善に対する頑健性を示す。

提案手法

FID における Fréchet 距離の仮定を批判的に分析し、正規性とサンプルサイズの問題を強調する。
現代画像の豊かな内容を捉えるために CLIP 埋め込みを採用する。
実データセットの CLIP 埋め込み間の平方和MMD距離として、ガウス RBF カーネルを用いて CMMD を定義する。
カーネル k(x,y)=exp(-||x-y||^2/(2*sigma^2)) かつ sigma=10 の無偏推定量を用い、読みやすさのため結果を 1000倍にスケールする。
CMMD の参照実装を提供する。
歪み、漸進的生成、サンプルサイズ設定を横断して CMMD と FID を比較し、人間評価を含めて検証する。

実験結果

リサーチクエスチョン

RQ1FID は現代のテキスト-to-画像モデルおよび漸進的な改良を通じて画像品質を信頼的に反映しているか。
RQ2CLIP ベースの MMD 指標は FID の代替として分布に依存しない、公平でサンプル効率の良い指標を提供できるか。
RQ3さまざまな歪みや反復生成プロセスの下で CMMD のスコアは人間の判断と一致するか。

主な発見

FID は人間の評価と矛盾することがあり、漸進的な生成モデルの段階的改善を追跴できないことがある。
CLIP ベースの埋め込みは Inception Features より豊かな内容を捉え、CMMD を頑健な指標として支持する。
CMMD は歪みレベルと漸進的改善を単調に反映し、人間の判断と一致する。
CMMD は FID よりサンプル効率が高く、計算も速いため、実用的なオンライン評価を可能にする。
実験は CMMD が人間の好みに合致する一方で FID が合致しない箇所を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。