QUICK REVIEW

[論文レビュー] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images

Jordan J. Bird, Ahmad Lotfi|arXiv (Cornell University)|Mar 24, 2023

Explainable Artificial Intelligence (XAI)被引用数 13

ひとこと要約

本論文は CIFAKE を作成し、リアルと SDM 生成画像を含む CIFAR-10 サイズの 120k 画像データセットを作成し、Real vs Fake を ~92.98% の精度で分類する CNN を訓練し、Grad-CAM を用いて説明を行う。

ABSTRACT

Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.

研究の動機と目的

AI 生成画像の検出の必要性を動機づけ、データの真正性と信頼性を確保する。
CIFAR-10 を模したリアル画像と AI 生成画像を含む合成データセット（CIFAKE）を作成する。
Real vs Fake 画像を識別する CNN に基づく分類器を開発する。
Explainable AI（Grad-CAM）を取り入れ、画像特徴に基づくモデル決定を解釈する。

提案手法

Stable Diffusion 1.4 を用いて CIFAKe データセットを生成し、10 クラスの CIFAR-10 とドメイン特有のプロンプトで画像を多様化する。
特徴抽出器のフィルター数と全結合層のサイズを変化させ、36 種類の CNN トポロジーを訓練して Real vs Fake 分類器の最良を特定する。
50k/50k の訓練分割と 10k/10k のテスト分割における二値分類指標（Accuracy, Precision, Recall, F1）でモデルを評価する。
Grad-CAM を適用して Real vs Fake の決定に影響を与える画像領域を強調する空間ヒートマップを作成する。
コミュニティ研究のために公開 CIFAKE データセットをリリースする。

Figure 1: Examples of images from the CIFAR-10 image classification dataset [ 24 ] .

実験結果

リサーチクエスチョン

RQ1CNN は高品質な AI 生成画像を実画像 CIFAR-10 から信頼性をもって識別できるか。
RQ2CIFAKE において Real vs Fake の二値分類性能が最も良い CNN トポロジー（特徴抽出器と全結合層）はどれか。
RQ3Grad-CAM の解釈が示す、分類決定に最も影響を与える視覚的手掛かりは何か。

主な発見

フィルター	層	精度
16	1	90.06
16	2	91.46
16	3	91.63
32	1	90.38
32	2	92.93
32	3	92.54
64	1	90.94
64	2	92.71
64	3	92.38
128	1	91.39
128	2	92.98
128	3	92.07

最良の特徴抽出器トポロジーは、2 層の 128 フィルターで検証精度 92.98%、loss 0.221。
特徴抽出器全体の平均検証精度は 91.79%。
最大 F1 スコアは 0.936 で、64 ニューロンの単一密結合層を用いた場合に観察。
Grad-CAM 分析では、実画像は全体的な画像領域に依存する一方、偽画像は視覚的欠陥がある可能性のあるまばらで局所的な領域に依存する。
CIFAKE データセットは 120,000 枚の画像（CIFAR-10 からの 60,000 枚の実画像 + 60,000 架の合成画像）を含み、公開公開済みである。
分類実験は 50k/50k の実画像/合成の訓練分割と 10k/10k のテスト分割で実施。

Figure 2: Examples of AI-generated images within the dataset contributed by this study, selected at random with regards to their real CIFAR-10 equivalent labels.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。