QUICK REVIEW

[논문 리뷰] Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

Anna Yoo Jeong Ha, Josephine Passananti|arXiv (Cornell University)|2024. 02. 05.

Aesthetic Perception and Analysis인용 수 7

한 줄 요약

이 논문은 다수의 스타일, 모델, 적대적 조건에서도 자동 탐지기와 인간 전문가가 인간이 만든 예술과 AI가 생성한 이미지를 구분하는 능력을 체계적으로 평가하며, Hive와 전문가 인간이 가장 강한 정확도를 제공하지만 보완적 약점을 보인다.

ABSTRACT

The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.

연구 동기 및 목표

다수의 AI 모델과 스타일에 걸쳐 자동 탐지기가 인간 예술과 AI가 생성한 이미지를 구분하는 능력을 평가한다.
배포된 세 가지 탐지기(Hive, Optic, Illuminarty)와 두 가지 연구 탐지기(DIRE, DE-FAKE)를 무교란 이미지와 교란 이미지에서 평가한다.
AI 생성 미술 식별에서 세 인간 군집(크라우드워커, 전문 예술가, AI 탐지 전문 예술가)의 성능을 비교한다.
적대적 교란이 탐지기 강건성에 미치는 영향을 분석하고 인간과 자동 탐지의 상보적 강점을 식별한다.

제안 방법

7가지 스타일의 실제 인간 예술 작품 280점과 5가지 확산 모델에서 생성된 350 AI 이미지(하이브리드 및 업스케일 변형 포함)를 수집하여 데이터셋을 구성한다.
다섯 가지 탐지기(Hive, Optic, Illuminarty, DIRE, DE-FAKE)를 적용하여 이미지를 인간 또는 AI 생성으로 분류하고 확률 점수를 보고한다.
이미지 분류에 대해 세 인간 연구를 수행한다(크라우드워커 180명, 전문 예술가 4000명 이상, 13명의 전문 예술가) 5점 리커트-유사 의사결정 프레임워크를 사용한다.
JPEG 압축, 가우시안 잡음, CLIP 기반 교란, Glaze 스타일 교란 등을 도입하여 탐지기의 강건성을 테스트한다.
교란 조건에서 탐지기를 평가하고 실패 모드를 분석하여 인간-ML 결합 탐지 접근법을 제안한다.

Figure 1. Samples from curated test set. Human artwork and subsequent matching images produced by generative AI models.

실험 결과

연구 질문

RQ1다양한 예술 스타일에 걸쳐 현재의 자동 탐지기와 인간 전문가가 인간 예술과 AI 생성 이미지를 신뢰할 수 있게 구분할 수 있는가?
RQ2적대적 교란이 탐지기와 인간의 AI 생성 예술 식별 정확도에 어떤 영향을 미치는가?
RQ3자동 탐지기와 인간 탐지자의 상대적 강점과 약점은 무엇이며, 결합 접근법이 강건성을 향상시키는가?

주요 결과

탐지기	ACC (%)	FPR (%)	FNR (%)
Hive	98.03	0.00	3.17
Optic	90.67	24.47	1.15
Illuminarty	72.65	67.40	4.69
DE-FAKE	50.32	41.79	56.00
DIRE (a)	55.40	99.29	0.86
DIRE (b)	51.59	25.36	66.86
Ensemble	98.75	0.48	1.71

Hive는 무교란 상태에서 98.03%의 최고 정확도, FPR 0%, FNR 3.17%를 달성한다.
전문가 및 전문 예술가들은 높은 정확도를 보이나 위양성(False positives)을 더 많이 발생시킨다; 전문가 예술가는 AI 이미지를 잘 식별하지만 인간 예술을 AI로 오인할 수 있다.
Optic와 Illuminarty는 Hive보다 성능이 떨어지며 FPR이 더 높다(각각 24.47%, 67.40%) 및 FNR은 다양하다.
DIRE 및 DE-FAKE 탐지기는 예술 데이터에 대해 성능이 저조하며 정확도가 약 50% 전후 또는 그 이하이다.
적대적 교란은 ML 탐지기의 성능을 크게 저하시킨다, 특히 특성 공간 교란에서 그렇다; CLIP 기반 교란과 Glaze 교란은 서로 다른 취약점을 보여준다.
인간과 자동 탐지기의 결합 팀이 전체적으로 가장 높은 정확도와 강건성을 보인다.

Figure 2. The confidence score produced by automated detectors on images generated by 5 generators. Detecting images generated by Firefly is the hardest.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.