QUICK REVIEW

[論文レビュー] Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?

Anna Yoo Jeong Ha, Josephine Passananti|arXiv (Cornell University)|Feb 5, 2024

Aesthetic Perception and Analysis被引用数 7

ひとこと要約

論文は自動検出器と人間の専門家を系統的に評価し、複数のスタイル、モデル、敵対的条件下で人が作成した芸術とAI生成画像を区別する能力を検証する。Hiveと専門家の人間が最も高い精度を示す一方で、補完的な弱点を持つ。

ABSTRACT

The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.

研究の動機と目的

複数のAIモデルとスタイルを横断して、人間の芸術とAI生成画像を区別する自動検出器の能力を評価する。
展開された3つの検出器（Hive、Optic、Illuminarty）と2つの研究検出器（DIRE、DE-FAKE）の未 Perturbedおよび perturbed 画像での性能を評価する。
AI生成アートを識別する際の3つの人間グループ（クラウドワーカー、プロのアーティスト、専門的なAI検出アーティスト）の性能を比較する。
敵対的撹乱が検出器の頑健性に与える影響を分析し、人間と自動検出の補完的な強みを特定する。

提案手法

7つのスタイルにまたがる実在芸術作品280点と、5つの拡散モデルからのAI生成画像350点（ハイブリッドおよびアップスケール版を含む）をデータセットとして作成する。
5つの検出器（Hive、Optic、Illuminarty、DIRE、DE-FAKE）を適用し、画像を人間/AI生成のいずれかに分類して確率スコアを報告する。
画像分類に関する3つの人間研究を実施する（クラウドワーカー180名、専門的アーティスト4000名超、専門家アーティスト13名）、5点リッカート風の意思決定フレームワークを用いる。
JPEG圧縮、ガウシアンノイズ、CLIPベースの撹乱、Glaze風撹乱を含む敵対的撹乱を導入し、検出器の頑健性を検証する。
撹乱条件下で検出器を評価し、失敗モードを分析して人間とMLの組み合わせ検出アプローチを提案する。

Figure 1. Samples from curated test set. Human artwork and subsequent matching images produced by generative AI models.

実験結果

リサーチクエスチョン

RQ1現在の自動検出器と人間の専門家は、多様な美術スタイルを横断して人間の芸術とAI生成画像を信頼できますか？
RQ2敵対的撹乱は検出器と人間がAI生成アートを識別する際の正確さにどう影響しますか？
RQ3自動検出と人間検出の相対的な強みと弱みは何であり、組み合わせたアプローチは頑健性を改善しますか？

主な発見

Detector	ACC (%)	FPR (%)	FNR (%)
Hive	98.03	0.00	3.17
Optic	90.67	24.47	1.15
Illuminarty	72.65	67.40	4.69
DE-FAKE	50.32	41.79	56.00
DIRE (a)	55.40	99.29	0.86
DIRE (b)	51.59	25.36	66.86
Ensemble	98.75	0.48	1.71

未撹乱時の最高精度はHiveが98.03%、FPR 0%、FNR 3.17%を達成。
専門家アーティストは高精度を示すが、偽陽性が増える傾向があり、AI画像は高精度で検出しても人間のアートをAIと誤分類する可能性がある。
OpticとIlluminartyはHiveより性能が劣り、FPRがそれぞれ24.47%、67.40%と高い。FNRは変動。
DIREとDE-FAKEはアート特有データに対して性能が低く、正確度は約50%以下。
敵対的撹乱は機械学習検出器の性能を著しく低下させ、特徴空間撹乱で特に顕著。CLIPベース撹乱とGlaze撹乱は異なる脆弱性を示す。
人間と自動検出の組み合わせチームは、全体の正確度と頑健性を最も高くする。

Figure 2. The confidence score produced by automated detectors on images generated by 5 generators. Detecting images generated by Firefly is the hardest.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。