QUICK REVIEW

[논문 리뷰] DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Pavel Korshunov, Sébastien Marcel|arXiv (Cornell University)|2018. 12. 20.

Digital Media Forensic Detection참고 문헌 24인용 수 490

한 줄 요약

이 논문은 GAN을 사용한 공개 Deepfake VidTIMIT 기반 데이터셋을 만들고, VGG/Facenet 얼굴 인식의 스왑된 얼굴에 대한 취약성을 입증하며, 탐지 방법을 평가하여 IQM+SVM이 기초 방법 중 최선이라는 것을 보여주고 립-싱크 방법은 실패한다.

ABSTRACT

It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.

연구 동기 및 목표

GAN 기반 Deepfake 비디오를 얼굴 바꿔치기로 공개적으로 이용 가능하도록 제공한다.
최첨단 얼굴 인식 시스템의 Deepfake 취약성을 평가한다.
베이스라인 Deepfake 탐지 방법을 평가하고 강점과 한계를 식별한다.

제안 방법

VidTIMIT에서 GAN 기반 얼굴 바꿔치기로 Deepfake를 생성하여 LQ(64x64)와 HQ(128x128) 비디오를 생산한다.
원본 비디오와 Deepfake 비디오에서 VGG 및 Facenet 얼굴 인식 성능을 평가한다.
오디오-시각 일관성 부정을 탐지하는 baseline으로 립-싱크를 테스트하고 PCA/LDA, IQM, SVM 등 여러 이미지 기반 품질/ML 베이스라인을 평가한다.
재현성을 위한 오픈소스 데이터셋과 구현을 제공한다.

실험 결과

연구 질문

RQ1현재 얼굴 인식 시스템(VGG 및 Facenet)은 GAN 기반 Deepfake 얼굴 스왑에 얼마나 취약한가?
RQ2기존의 탐지 접근법은 Deepfake 비디오를 진짜 영상과 신뢰성 있게 구별할 수 있는가, 그리고 어떤 특징이 가장 효과적인가?
RQ3립-싱크 기반 탐지가 Deepfake를 감지하는 데 이미지 기반 정보 품질 측정보다 우수한가?
RQ4영상 품질(LQ 대 HQ)이 인식 취약성과 탐지 정확도에 어떤 영향을 미치는가?

주요 결과

VGG 및 Facenet 얼굴 인식은 Deepfake 비디오에 높은 취약성을 보이며, licit 데이터의 EER 임계값에서 VGG의 경우 HQ 85.62%, LQ 88.75%, Facenet의 경우 HQ 95.00%, LQ 94.38%를 보인다.
립-싱크 기반 탐지는 Deepfake를 원본과 구별하지 못하여 오디오-시각 부조화 접근법의 한계를 시사한다.
IQM 기반 특징과 SVM 분류기를 이용한 HQ Deepfake에서 EER 8.97%, FRR@FAR10% 9.05%를 달성하며 HQ 성능이 LQ보다 현저히 우수하다.
IQM 기반 방법(PCA/LDA 포함)은 효과가 다양하며, 예를 들어 IQM+PCA+LDA는 LQ에서 20.52% EER, FRR@FAR10%는 66.67%를 보인다.
전반적으로 베이스라인 탐지기는 HQ Deepfake에 대해 덜 효과적이므로 더 강력한 탐지 방법과 데이터셋의 필요성이 강조된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.