QUICK REVIEW

[논문 리뷰] Progressive Feedback-Enhanced Transformer for Image Forgery Localization

Haochen Zhu, Gang Cao|arXiv (Cornell University)|2023. 11. 15.

Digital Media Forensic Detection인용 수 13

한 줄 요약

ProFact는 현실적인 MBH 생성 학습 데이터를 활용하여 거칠은 단계에서 미세한 단계까지의 이미지 위조 로컬라이제이션을 위한 점진적이고 피드백 주도형 트랜스포머 프레임워크를 소개하며, 아홉 개 데이터셋에서 최첨단 성과를 달성합니다.

ABSTRACT

Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers, which can enhance the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.

연구 동기 및 목표

눈에 잘 띄지 않는 미세한 흔적을 감지하기 어려운 변조 영상에서 위조 영역의 견고한 로컬라이제이션을 촉진한다.
중간 표현을 다듬기 위해 피드백을 활용하는 거칠은-정밀도 로컬라이제이션 프레임워크를 개발한다.
다중 스케일 신호를 포착하기 위해 Contextual Spatial Pyramid Module를 활용하여 특징 학습을 강화한다.
현실적이고 대규모의 위조 이미지를 생성하고 두 단계의 점진적 학습 프로토콜을 적용하여 학습 데이터 격차를 해소한다.

제안 방법

ProFact는 두 개의 연쇄 분기: Coarse Localization Branch (CLB)와 Feedback Enhancement Branch (FEB)를 점진적 피드백 메커니즘으로 연결하여 사용한다.
CLB는 SegFormer (MiT blocks)에 의존하여 거친 맵 Mc를 생성하고, 특징 강화를 위한 Contextual Spatial Pyramid Module (CSPM)을 통합한다.
FEB는 Mc를 받아 CLB 특징과 함께 holistic attention module (HAM)을 적용하여 표현을 정제하고 최종 맵 Mp를 예측한다.
Contextual Spatial Pyramid Module (CSPM)은 Contextual Transformer (CoT) 블록과 다중 스케일 확장 합성 피라미드를 결합하여 지역적 및 맥락적 특징을 풍부하게 한다.
Training data are generated with MBH (Matting, Blending, Harmonization) to produce large-scale, realistic forged images, including MBH-COCO and MBH-RAISE datasets.
A two-stage training protocol first trains on MBH-COCO and then fine-tunes on MBH-RAISE with larger input sizes to improve generalization.

실험 결과

연구 질문

RQ1피드백 강화 트랜스포머가 전통적인 인코더-디코더 네트워크를 넘어 위조 영역의 로컬라이제이션 정확도를 향상시킬 수 있는가?
RQ2중간 피처 정제를 포함한 거칠-정밀 전략이 다양한 위조 유형과 해상도에서 탐지 강건성에 어떤 영향을 미치는가?
RQ3현실적 위조 학습 샘플(MBH)이 데이터셋 간 일반화 및 위조 로컬라이제이션 방법의 강건성을 향상시키는가?
RQ4다중 스케일 맥락 특징(CSPM)이 미세한 변조 흔적 탐지에 미치는 영향은 무엇인가?

주요 결과

데이터셋	Noiseprint	ManTra-Net	DFCN	MVSS-Net	PSCC-Net	OSN	CAT-Net	ProFact	Average
Columbia	36.4 (7)	35.6 (8)	38.1 (6)	68.4 (4)	61.5 (5)	71.3 (3)	79.3 (2)	84.5 (1)	55.2 (1)
CASIAv1	12.9 (7)	13.0 (6)	8.3 (8)	45.1 (5)	46.3 (4)	50.9 (3)	71.0 (1)	56.4 (2)	54.7 (3)
NIST16	12.2 (6)	9.2 (7)	-	29.4 (4)	18.7 (5)	33.1 (2)	30.2 (3)	43.1 (1)	28.9 (6)
DSO-1	33.9 (6)	33.2 (7)	68.4 (1)	27.1 (8)	41.1 (5)	44.5 (4)	47.9 (2)	46.4 (3)	40.4 (7)
IMD	17.9 (5)	18.3 (4)	17.3 (6)	26.0 (3)	15.8 (7)	49.1 (2)	-	53.8 (1)	25.8 (5)
Korus	14.7 (4)	17.9 (3)	10.8 (5)	9.5 (7)	10.2 (6)	29.9 (2)	6.1 (8)	31.5 (1)	16.2 (6)
Coverage	14.7 (8)	27.5 (5)	-	44.5 (2)	44.4 (3)	26.0 (6)	28.9 (4)	51.1 (1)	25.0 (8)
In the Wild	16.7 (6)	15.6 (7)	-	-	10.8 (8)	50.5 (2)	34.1 (3)	64.5 (1)	25.6 (7)
AutoSplice	33.0 (7)	18.2 (8)	-	64.6 (3)	60.4 (4)	50.9 (5)	86.2 (1)	65.5 (2)	39.0 (5)
Average	21.4 (7)	20.9 (8)	31.2 (6)	34.8 (4)	34.3 (5)	45.1 (3)	48.0 (2)	55.2 (1)

ProFact는 아홉 개 데이터셋에서 평균 로컬라이제이션 성능이 최고를 달성했으며, 두 번째로 높은 CAT-Net 대비 7.2%(F1) 및 5.6%(IoU) 향상을 보였다.
본 방법은 해상도가 높은 데이터와 미확인 AutoSplice 데이터를 포함한 데이터셋에서도 일관되게 상위 두위에 랭크되어 강력한 일반화를 보여준다.
MBH 생성 데이터와 더 큰 테스트 크기를 활용한 2단계 학습은 스케일 및 경계 현실감에 대한 강건성을 향상시킨다.
제안된 ProFact는 DSO-1과 같은 도전적인 데이터셋에서 눈에 띄는 이득을 보이며 탑-3에 근접한 성능을 달성한다.
정성적 결과는 피드백 정제를 거친 후 오탐이 줄어든 정제된 로컬라이제이션 맵 Mp를 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.