QUICK REVIEW

[論文レビュー] Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2

Ali Borji|arXiv (Cornell University)|Oct 2, 2022

Generative Adversarial Networks and Image Synthesis被引用数 96

ひとこと要約

この論文は、FIDを用いて実世界のフォトリアリスティックな顔生成をStable Diffusion、Midjourney、DALL-E 2で定量比較する。3つの中でStable Diffusionが最も顔の品質が高い。評価のための新しいGFWデータセットを導入。

ABSTRACT

The field of image synthesis has made great strides in the last couple of years. Recent models are capable of generating images with astonishing quality. Fine-grained evaluation of these models on some interesting categories such as faces is still missing. Here, we conduct a quantitative comparison of three popular systems including Stable Diffusion, Midjourney, and DALL-E 2 in their ability to generate photorealistic faces in the wild. We find that Stable Diffusion generates better faces than the other systems, according to the FID score. We also introduce a dataset of generated faces in the wild dubbed GFW, including a total of 15,076 faces. Furthermore, we hope that our study spurs follow-up research in assessing the generative models and improving them. Data and code are available at data and code, respectively.

研究の動機と目的

混雑したシーンでフォトリアリスティックな顔を生成する三つのテキストから画像へのモデルの能力を評価する。
統制された顔データセットとFIDを評価指標として、公正な定量比較を提供する。
将来のジェネレーティブモデル評価とバイアス研究を可能にする生成顔データセットを作成する。

提案手法

COCOキャプションを用いて、三つのモデルに顔を含む画像を生成するよう促す。
生成画像と実画像の顔をMediaPipe顔検出器で検出し、偽陽性を除去する。
生成された顔を100x100にリサイズし、実顔に対してFréchet Inception Distance (FID)を計算する；平均と標準偏差を推定するため、複数のランダムサンプルで繰り返す。

実験結果

リサーチクエスチョン

RQ1Stable Diffusion、Midjourney、DALL-E 2は混雑したシーンでフォトリアリスティックな顔を生成できるか、そして定量的にどう比較されるか？
RQ2FIDは、サンプルサイズの違いや記憶・セーフガードの可能性がある場合でも、モデル間の顔の品質を信頼性高く区別できるか？
RQ3これらのシステムにおける野外での生成顔の現実的な制限と故障モードは何か？

主な発見

Stable Diffusionは生成顔に対して三つのモデルの中で最良のFIDを示す。
すべてのモデルは眼鏡、目、遮蔽、横顔、顔の左右対称性の課題に苦戦し、実顔の品質が著しく高いままである。
DALL-E 2はStable Diffusionより性能が劣る。セーフガード、ポートレートへの焦点、画像セットの小ささが要因となる可能性がある；Midjourneyはしばしば超現実的またはアニメ風の顔を生成する。
サンプルサイズを大きくするほどFID比較の安定性が向上し、著者らは15,076の生成顔データセットを作成した（8,050 Stable Diffusion、6,350 Midjourney、676 DALL-E 2）。
memorization（記憶化）とウォーターマークの問題の可能性を示唆し、FID以外の評価指標（例：SSIM、LPIPS、人間の判断）も含むより広範な評価指標が望ましい。
この論文は実世界での顔生成の将来のベンチマーキングとより深い分析を促進するデータとコードアクセスを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。