QUICK REVIEW

[论文解读] Rethinking Image Forgery Detection via Soft Contrastive Learning and Unsupervised Clustering

Haiwei Wu, Yiming Chen|arXiv (Cornell University)|Aug 18, 2023

Digital Media Forensic Detection被引用 8

一句话总结

FOCAL 将图像伪造检测重新定义为逐图像监督的像素级对比学习和无监督聚类，在不重新训练的情况下实现跨数据集的 IoU/F1 强势提升。它还支持简单的特征级融合以提升性能。

ABSTRACT

Image forgery detection aims to detect and locate forged regions in an image. Most existing forgery detection algorithms formulate classification problems to classify pixels into forged or pristine. However, the definition of forged and pristine pixels is only relative within one single image, e.g., a forged region in image A is actually a pristine one in its source image B (splicing forgery). Such a relative definition has been severely overlooked by existing methods, which unnecessarily mix forged (pristine) regions across different images into the same category. To resolve this dilemma, we propose the FOrensic ContrAstive cLustering (FOCAL) method, a novel, simple yet very effective paradigm based on soft contrastive learning and unsupervised clustering for the image forgery detection. Specifically, FOCAL 1) designs a soft contrastive learning (SCL) to supervise the high-level forensic feature extraction in an image-by-image manner, explicitly reflecting the above relative definition; 2) employs an on-the-fly unsupervised clustering algorithm (instead of a trained one) to cluster the learned features into forged/pristine categories, further suppressing the cross-image influence from training data; and 3) allows to further boost the detection performance via simple feature-level concatenation without the need of retraining. Extensive experimental results over six public testing datasets demonstrate that our proposed FOCAL significantly outperforms the state-of-the-art competitors by big margins: +24.8% on Coverage, +18.9% on Columbia, +17.3% on FF++, +15.3% on MISD, +15.0% on CASIA and +10.5% on NIST in terms of IoU (see also Fig. 1). The paradigm of FOCAL could bring fresh insights and serve as a novel benchmark for the image forgery detection task. The code is available at https://github.com/HighwayWu/FOCAL.

研究动机与目标

在单一图像内重新思考伪造像素与原始像素的相对定义，并解决伪造检测中的跨图像不一致性。
开发面向图像伪造任务的像素级对比学习框架。
在测试阶段引入一个按需的无监督聚类步骤，将特征映射为伪造/原始，而不影响跨数据集的训练。
通过简单的特征级融合在不重训练的情况下提升性能。
展示在六个公开测试数据集上的鲁棒性和跨领域泛化能力。

提出的方法

使用像素级对比学习在逐图像的方式中监督高级取证特征，利用真实伪造掩码作为正负标签。
采用改进的 InfoNCE 损失（InfoNCE++），在每个图像中对所有正键求均值以实现稳定优化。
在测试阶段应用按需聚类算法（HDBSCAN）将特征映射为伪造/原始，无需训练参数。
可选地在特征层面对来自多种骨干网络（如 HRNet 和 ViT）的特征进行融合，以在不重训练的情况下提升检测性能。

实验结果

研究问题

RQ1将伪造像素/原始像素在图像内的相对定义与传统的批次级监督相比，如何影响检测性能？
RQ2是否通过像素级对比学习、图像逐对损失与无监督聚类实现跨数据集的伪造检测提升？
RQ3多骨干特征级融合在不重训练的情况下是否能提升伪造定位？

主要发现

Methods	Columbia F1	Columbia IoU	Coverage F1	Coverage IoU	CASIA F1	CASIA IoU	MISD F1	MISD IoU	NIST F1	NIST IoU	FF++ F1	FF++ IoU	Mean F1	Mean IoU
Lyu-NOI	.522	.150	.481	.125	.356	.095	.507	.199	.478	.026	.496	.071
PCA-NOI	.539	.168	.529	.125	.472	.093	.517	.150	.460	.046	.523	.108
PSCC-Net	.577	.480	.655	.337	.716	.409	.746	.448	.300	.078	.509	.092	.584	.307
PSCC-Net †	.850	.770	.584	.179	.753	.474	.735	.403	.632	.251	.518	.068	.679	.357
MVSS-Net	.766	.591	.700	.384	.707	.396	.803	.525	.621	.243	.553	.127	.691	.378
MVSS-Net †	.888	.784	.690	.356	.770	.509	.765	.450	.635	.255	.633	.241	.730	.433
IF-OSN	.766	.612	.561	.178	.741	.465	.811	.548	.639	.246	.628	.266	.691	.386
IF-OSN †	.846	.719	.651	.314	.828	.553	.765	.521	.608	.226	.607	.222	.717	.426
CAT-Net	.864	.741	.614	.231	.846	.642	.665	.314	.620	.230	.534	.095	.690	.375
TruFor	.821	.734	.741	.450	.835	.626	.746	.423	.688	.343	.817	.565	.774	.523
FOCAL (HRNet)	.962	.929	.769	.524	.864	.706	.857	.639	.710	.403	.837	.605	מצ
FOCAL (ViT)	.980	.969	.835	.647	.897	.766	.874	.666	.724	.433	.846	.630	.?
FOCAL (Fusion)	.981	.970	.863	.693	.898	.777	.886	.690	.737	.446	.904	.740	.878	.719

FOCAL 通过图像逐对对比学习和按需聚类，在六个测试数据集上的 IoU 和 F1 方面显著优于最先进的方法。
HRNet 与 ViT 的特征融合（FOCAL Fusion）在跨数据集性能上最佳，在 IoU 指标上超越竞争方法且提升显著（如 Coverage +18.6%、Columbia +17.5%、FF++ +10.3%）。
无监督聚类（HDBSCAN）有效应对每张图像的多种伪造类型并降低误报，尤其是在原始图像上。
改进的 InfoNCE 损失（InfoNCE++）在每个查询聚合多个正键，提升收敛速度和相较于基于批次的或 vanilla InfoNCE 的稳定性。
特征层面融合在不重训的前提下提供显著收益，显示对骨干网络选择的鲁棒性并缓解单一提取器的偏差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。