[论文解读] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images
本文提出 CIFAKE,是一个包含 Real 与 SDM 生成图像的、约 12万张、与 CIFAR-10 大小相同的数据集;训练 CNN 将 Real 与 Fake 分类,准确率约为 92.98%;并使用 Grad-CAM 进行解释。
Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.
研究动机与目标
- 动机:需要检测 AI 生成的图像,以确保数据的真实性和可信度。
- 创建一个合成数据集 (CIFAKE),其结构与 CIFAR-10 相似,包含真实和 AI 生成的图像。
- 开发基于 CNN 的分类器,用以区分 Real 与 Fake 图像。
- 结合可解释性 AI(Grad-CAM)来解释模型在图像特征上的决策。
提出的方法
- 使用 Stable Diffusion 1.4 生成合成 CIFAKe 数据集,包含 10 个 CIFAR-10 类别并通过领域特定提示来丰富图像。
- 通过改变特征提取器滤波器和全连接层大小,训练 36 种 CNN 拓扑结构,以识别最佳的 Real vs Fake 分类器。
- 使用二元分类指标(准确率、精确率、召回率、F1)在 50k/50k 的训练集和 10k/10k 的测试集上评估模型。
- 应用 Grad-CAM 生成空间热图,突出对 Real vs Fake 决策有影响的图像区域。
- 公开发布 CIFAKE 数据集,供社区研究使用。
![Figure 1: Examples of images from the CIFAR-10 image classification dataset [ 24 ] .](https://ar5iv.labs.arxiv.org/html/2303.14126/assets/x1.png)
实验结果
研究问题
- RQ1Can a CNN reliably distinguish high-quality AI-generated images from real CIFAR-10 images?
- RQ2Which CNN topology (feature extractor and dense layers) yields the best binary classification performance for Real vs Fake in CIFAKE?
- RQ3What visual cues, if any, do Grad-CAM explanations reveal as most influential in the classification decision?
主要发现
| 过滤器 | 层数 | 准确率 |
|---|---|---|
| 16 | 1 | 90.06 |
| 16 | 2 | 91.46 |
| 16 | 3 | 91.63 |
| 32 | 1 | 90.38 |
| 32 | 2 | 92.93 |
| 32 | 3 | 92.54 |
| 64 | 1 | 90.94 |
| 64 | 2 | 92.71 |
| 64 | 3 | 92.38 |
| 128 | 1 | 91.39 |
| 128 | 2 | 92.98 |
| 128 | 3 | 92.07 |
- 最佳特征提取器拓扑:两层 128 过滤器在验证集上达到 92.98% 准确率,损失为 0.221。
- 整体平均验证准确率在所有特征提取器中为 91.79%。
- 最高 F1-score 为 0.936,使用单层 64 个神经元的全连接层时达到。
- Grad-CAM 分析表明真实图像依赖于整体的图像区域,而伪造图像依赖于稀疏、局部区域,可能存在可见瑕疵。
- CIFAKE 数据集包含 120,000 张图像(60,000 张来自 CIFAR-10 的真实图像 + 60,000 张合成图像),并公开发布。
- 分类实验在 50k/50k 的真实/合成训练集和 10k/10k 测试集上进行。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。