QUICK REVIEW

[논문 리뷰] Exposing DeepFake Videos By Detecting Face Warping Artifacts

Yuezun Li, Siwei Lyu|arXiv (Cornell University)|2018. 11. 01.

Digital Media Forensic Detection참고 문헌 36인용 수 571

한 줄 요약

이 논문은 이미지 처리를 통해 생성된 합성 부정 데이터를 이용하고 DeepFakes를 학습시키지 않는 CNN 기반 방법으로, 얼굴 워핑의 아티팩트를 활용해 DeepFake 비디오를 탐지합니다.

ABSTRACT

In this work, we describe a new deep learning based method that can effectively distinguish AI-generated fake videos (referred to as {\em DeepFake} videos hereafter) from real videos. Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video. Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks (CNNs). Compared to previous methods which use a large amount of real and DeepFake generated images to train CNN classifier, our method does not need DeepFake generated images as negative training examples since we target the artifacts in affine face warping as the distinctive feature to distinguish real and fake images. The advantages of our method are two-fold: (1) Such artifacts can be simulated directly using simple image processing operations on a image to make it as negative example. Since training a DeepFake model to generate negative examples is time-consuming and resource-demanding, our method saves a plenty of time and resources in training data collection; (2) Since such artifacts are general existed in DeepFake videos from different sources, our method is more robust compared to others. Our method is evaluated on two sets of DeepFake video datasets for its effectiveness in practice.

연구 동기 및 목표

얼굴 합성 파이프라인의 아티팩트에 초점을 맞춰 강건한 DeepFake 탐지를 동기화합니다.
DeepFake 얼굴 합성이 대상 얼굴에 맞추기 위해 고정 크기 이미지를 워핑한다는 통찰을 활용합니다.
합성된 비 DeepFake 음수를 이미지 처리로 워핑 아티팩트를 시뮬레이션하여 실제 네거티브 데이터 필요성을 제거합니다.
다양한 DeepFake 소스에서의 강건성을 보이는 일반 워핑 아티팩트에 초점을 맞춥니다.

제안 방법

얼굴을 탐지하고 특징점으로 얼굴 영역을 추출하여 아핀 변환 행렬을 식별합니다.
여러 스케일로 얼굴을 정렬하고 가우시안 블러를 적용한 뒤 원래 크기로 아핀 워핑하여 음성 예를 시뮬레이션합니다.
현실감을 높이기 위해 다양한 색상, 밝기, 대비, 왜곡 및 다각형 기반 얼굴 형태로 데이터를 확장합니다.
얼굴 부위 주변 영역을 포함한 관심 영역을 잘라 224x224로 재스케일링하고 CNN(VGG16, ResNet50/101/152)을 학습합니다.
추론 시 이미지당 ROI 샘플링을 10회 적용하고 CNN 출력의 평균을 내어 최종 가짜 확률을 얻습니다.

실험 결과

연구 질문

RQ1DeepFake 파이프라인의 아핀 얼굴 워핑에서의 아티팩트가 CNN으로 신뢰성 있게 탐지될 수 있는가?
RQ2음수 샘플의 합성(비 DeepFake) 생성이 견고한 탐지기를 학습하는 데 충분한가?
RQ3공개 DeepFake 데이터셋에서 높은 탐지 성능을 위해 어떤 CNN 아키텍처가 워핑-아티팩트 단서를 가장 잘 활용하는가?

주요 결과

ResNet50이 이미지 기반 AUC에서 UADFV 97.4%, 비디오 기반 98.7%로 최고치를 달성했습니다.
ResNet101 및 ResNet152도 UADFV 이미지에서 약 95–99%, 비디오 테스트에서 97–99%의 AUC로 높은 성능을 보였습니다.
DeepfakeTIMIT HQ에서 ResNet50은 이미지 기반 AUC 99.9%를 달성하여 다른 방법 대비 유의미한 차이로 우수한 성능을 보였습니다.
DeepfakeTIMIT HQ에서 ResNet152는 HQ에서 91.2%의 AUC를, ResNet50은 93.2%를 달성했고 LQ에서 99.9%였습니다(품질 설정 간에 견고한 성능).
이 방법은 두-스트림 NN, MesoNet 변형, HeadPose보다 두 데이터셋에서 우수한 성능을 보였으며, DeepFake 변형에 대한 강건성을 강조합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.