QUICK REVIEW

[논문 리뷰] DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

Zhiyuan Yan, Yong Zhang|arXiv (Cornell University)|2023. 07. 04.

Anomaly Detection Techniques and Applications인용 수 19

한 줄 요약

DeepfakeBench는 표준화된 데이터 처리, 15개 탐지기, 9개 데이터셋, 포괄적인 평가 프로토콜을 갖춘 모듈형의 통합 딥페이크 탐지 벤치마크를 제시하여 재현성 및 공정한 비교를 향상시킵니다.

ABSTRACT

A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.

연구 동기 및 목표

공정한 비교를 가능하게 하기 위해 딥페이크 탐지에서 표준화되고 통합된 벤치마크의 필요성을 촉진한다.
데이터 처리, 탐지기 구현, 평가를 위한 모듈식이고 확장 가능한 프레임워크를 제공한다.
여러 탐지기와 데이터셋에 걸친 포괄적 평가를 제공하여 통찰과 일반화 경향을 밝힌다.

제안 방법

일관성과 재현성을 보장하기 위해 Data Processing, Training, Evaluation/Analysis의 세 모듈 코드베이스를 개발한다.
프레임 추출, 얼굴 자르기/정렬, 마스크 처리 등을 포함하는 통합 전처리 파이프라인으로 데이터 입력을 표준화한다.
공통 학습/평가 프레임워크 내에 15개의 최신 탐지기(단순/공간/주파수)와 9개 데이터셋을 통합한다.
프레임 수준 평가 지표(ACC, AUC, AP, EER)를 채택하고 시각화 도구(ROC, 레이더, 히스토그램) 및 해석 가능성 분석(Grad-CAM, t-SNE)을 제공한다.
일관된 일반화와 강건성을 평가하기 위한 도메인 내/교차 도메인/교차 조작 평가를 광범위하게 수행한다.
데이터 증강, 백본, 사전학습, 프레임 수 등의 요인을 분석하여 새로운 통찰을 도출한다.

실험 결과

연구 질문

RQ1통합 벤치마크가 딥페이크 탐지 평가의 공정성과 재현성을 어떻게 개선할 수 있는가?
RQ2다양한 탐지기들의 성능은 단일 데이터 처리 및 프로토콜 하에 여러 데이터셋에 걸쳐 평가될 때 어떠한가?
RQ3데이터 증강, 백본 아키텍처, 사전 학습, 프레임 샘플링이 탐지 성능 및 교차 도메인 일반화에 어떤 영향을 미치는가?

주요 결과

DeepfakeBench는 표준화된 프로토콜 하에서 9개 데이터셋에 걸쳐 15개의 탐지기를 평가하여 방법 간 공정한 비교를 가능하게 한다.
도메인 내 결과는 UCF, Xception, EfficientB4, F3Net 등 여러 탐지기에서 AUC가 높은 편이며 평균 점수가 경우에 따라 90대 중반에 이른다.
단순 탐지기(Xception, EfficientB4 변형 등)가 경쟁력 있는 AUC를 달성하여 데이터 처리 및 학습 설정이 성능에 큰 영향을 미친다는 점을 시사한다.
교차 조작 분석은 한 위조 유형에서 학습된 탐지기가 보지 못한 위조 유형에 직면했을 때 상당한 일반화 격차를 드러낸다.
백본 선택과 아키텍처 특성(예: 깊이별 분리 합성곱)이 데이터셋 전반의 탐지 성능에 실질적으로 영향을 미친다.
사전 학습은 일반적으로 성능을 향상시키며 특히 Xception과 EfficientNetB4에서 두드러지므로 전이된 저수준 특징의 가치가 강조된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.