QUICK REVIEW

[论文解读] GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image

Mingjian Zhu, Hanting Chen|arXiv (Cornell University)|Jun 14, 2023

Generative Adversarial Networks and Image Synthesis被引用 33

一句话总结

GenImage 引入一个 million-scale 通用图像数据集，用于 AI 生成图像检测，并新增两个评估任务（跨生成器和降质图像分类），以评估检测器在不同生成器和图像降质条件下的泛化能力。

ABSTRACT

The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.

研究动机与目标

创建一个 million-scale 通用型 AI 生成图像数据集，并与 ImageNet 类别对齐。
通过包含多样的生成器（GANs 和扩散模型）和广泛的内容，来实现鲁棒检测器训练。
引入反映真实世界场景的评估任务：跨生成器泛化能力和对降质图像的鲁棒性。
在 GenImage 基准上提供跨骨干网络和现有方法的检测器基线分析。

提出的方法

通过将 ImageNet 的真实图像与由八种现代生成器（BigGAN、GLIDE、VQDM、Stable Diffusion V1.4、Stable Diffusion V1.5、ADM、Midjourney、Wukong）生成的合成图像配对，组建超过一百万对真实/假图像的数据集。
使用 1000 ImageNet 类标签来生成一个平衡的假集（~1.35M 假图像，~1.33M 真实图像）。
使用骨干网络模型（ResNet-50、DeiT-S、Swin-T）和现有检测器（CNNSpot、Spec）作为基线来评估检测器。
提出两个任务：（i）跨生成器图像分类，以测试跨生成器的泛化能力；（ii）降质图像分类，以测试对分辨率变化、JPEG 压缩和模糊的鲁棒性。
通过频域分析和生成器相关性来分析数据集属性，以理解伪影和跨生成器传递。

实验结果

研究问题

RQ1训练在一个生成器上的检测器在其他生成器生成的图像上的泛化能力如何？
RQ2在常见的图像降质（低分辨率、压缩、模糊）下，检测器性能如何退化？
RQ3哪种骨干网络架构或现有检测器在 GenImage 上具有更强的泛化能力，GAN 和扩散模型生成的图像如何影响性能？
RQ4哪些数据集和内容因素（类别数量、每类图像数量、内容多样性）可以提升跨生成器和降质图像鲁棒性？

主要发现

在同一生成器上训练并在同一生成器上测试的检测器能够达到非常高的准确率（高达 99.9%），但跨生成器泛化显著较弱（在八个生成器中的平均为 66.9% 左右）。
Swint-T 作为基于变换器的骨干，在所报告的设置中实现了最佳的跨生成器平均值，ResNet-50 和 DeiT-S 紧随其后。
CNNSpot 和 Spec 在聚焦 GAN 的数据集上表现强劲，但在 GenImage 上表现不如 diffusion 模型内容，强调需要针对生成器的特定骨干网或通用化骨干网。
增加数据规模、类别多样性和每类图像数量可以显著提升跨生成器和降质图像的准确性，规模更大（≈1.6e5–1.62e6 图像）的设置获得更高的性能。
降质图像实验显示鲁棒性各不相同：CNNSpot 由于训练时的预处理，对 JPEG 和模糊具有较强的鲁棒性；标准骨干网络对 JPEG 压缩和降采样表现出显著敏感性。
生成器相关性分析显示，在相似架构（如 Stable Diffusion 变体）上训练可提供更好的跨生成器传递，而 Midjourney 在泛化方面仍然具有挑战性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。