QUICK REVIEW

[논문 리뷰] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images

Jordan J. Bird, Ahmad Lotfi|arXiv (Cornell University)|2023. 03. 24.

Explainable Artificial Intelligence (XAI)인용 수 13

한 줄 요약

이 논문은 Real과 SDM 생성 이미지가 포함된 CIFAKE라는 12만 장 이미지의 CIFAR-10 크기 데이터셋을 만들고, Real vs Fake를 분류하는 CNN을 약 92.98% 정확도로 학습시키며, 설명을 위해 Grad-CAM을 사용한다.

ABSTRACT

Recent technological advances in synthetic data have enabled the generation of images with such high quality that human beings cannot tell the difference between real-life photographs and Artificial Intelligence (AI) generated images. Given the critical necessity of data reliability and authentication, this article proposes to enhance our ability to recognise AI-generated images through computer vision. Initially, a synthetic dataset is generated that mirrors the ten classes of the already available CIFAR-10 dataset with latent diffusion which provides a contrasting set of images for comparison to real photographs. The model is capable of generating complex visual attributes, such as photorealistic reflections in water. The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI. This study then proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake. Following hyperparameter tuning and the training of 36 individual network topologies, the optimal approach could correctly classify the images with 92.98% accuracy. Finally, this study implements explainable AI via Gradient Class Activation Mapping to explore which features within the images are useful for classification. Interpretation reveals interesting concepts within the image, in particular, noting that the actual entity itself does not hold useful information for classification; instead, the model focuses on small visual imperfections in the background of the images. The complete dataset engineered for this study, referred to as the CIFAKE dataset, is made publicly available to the research community for future work.

연구 동기 및 목표

데이터의 진정성과 신뢰성을 보장하기 위해 AI 생성 이미지를 탐지할 필요성을 제시한다.
실제 CIFAR-10과 동일한 특성을 가지는 합성 데이터셋(CIFAKE)을 실제와 AI 생성 이미지로 구성한다.
Real vs Fake 이미지를 구분하는 CNN 기반 분류기를 개발한다.
Explainable AI(Grad-CAM)를 도입하여 이미지 특징에 대한 모델의 의사결정을 해석한다.

제안 방법

Stable Diffusion 1.4를 사용하여 CIFAKe 데이터셋을 생성하되 10개의 CIFAR-10 클래스와 도메인별 프롬프트로 이미지를 다양화한다.
다양한 특성 추출기 필터와 밀집층 크기를 바꿔 36개의 CNN 토폴로지를 학습하여 Real vs Fake 분류기의 최적를 찾아낸다.
50k/50k 학습 분할 및 10k/10k 테스트 분할에서 정확도, 정밀도, 재현율, F1 등 이진 분류 지표로 모델을 평가한다.
Grad-CAM을 적용하여 Real vs Fake 결정에 영향을 주는 이미지 영역의 공간적 히트맵을 생성한다.
커뮤니티 연구를 위한 공개 CIFAKE 데이터셋을 제공한다.

Figure 1: Examples of images from the CIFAR-10 image classification dataset [ 24 ] .

실험 결과

연구 질문

RQ1CNN이 CIFAR-10 이미지에서 고품질 AI 생성 이미지를 실제 CIFAR-10 이미지와 신뢰성 있게 구분할 수 있는가?
RQ2어떤 CNN 토폴로지(특징 추출기와 밀집 층)가 CIFAKE에서 Real vs Fake에 가장 우수한 이진 분류 성능을 보이는가?
RQ3Grad-CAM 설명이 분류 결정에서 가장 영향력 있는 시각적 단서를 어떤 것들로 드러내는가?

주요 결과

필터	레이어	정확도
16	1	90.06
16	2	91.46
16	3	91.63
32	1	90.38
32	2	92.93
32	3	92.54
64	1	90.94
64	2	92.71
64	3	92.38
128	1	91.39
128	2	92.98
128	3	92.07

최고의 특징 추출기 토폴로지: 두 층의 128필터가 92.98% 유효 검증 정확도와 0.221의 손실을 달성했다.
특징 추출기에 걸친 전체 평균 검증 정확도는 91.79%였다.
가장 높은 F1-점수는 0.936으로 64개 노드의 단일 밀집층을 사용할 때 관찰되었다.
Grad-CAM 분석은 실제 이미지는 전체적인 이미지 영역에 의존하는 반면, 가짜 이미지는 희박하고 국소화된 영역과 시각적 불완전함에 의존하는 경향을 보인다.
CIFAKE 데이터셋은 120,000장의 이미지로 구성되며(실제 60,000개 CIFAR-10 + 합성 60,000개) 공개 배포되었다.
분류 실험은 50k/50k 실제/합성 학습 분할 및 10k/10k 테스트 분할에서 수행되었다.

Figure 2: Examples of AI-generated images within the dataset contributed by this study, selected at random with regards to their real CIFAR-10 equivalent labels.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.