QUICK REVIEW

[论文解读] Progressive Feedback-Enhanced Transformer for Image Forgery Localization

Haochen Zhu, Gang Cao|arXiv (Cornell University)|Nov 15, 2023

Digital Media Forensic Detection被引用 13

一句话总结

ProFact 引入了一种渐进式、反馈驱动的 Transformer 框架，用于从粗到细的图像伪造定位，借助现实的 MBH 生成训练数据，在九个数据集上实现最先进的结果。

ABSTRACT

Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers, which can enhance the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.

研究动机与目标

在篡改图像中定位伪造区域的鲁棒性，尤其当细微痕迹难以检测时。
开发一个利用反馈来细化中间特征表示的粗到细定位框架。
通过上下文空间金字塔模块增强特征学习，以捕捉多尺度线索。
通过生成真实、规模较大的伪造图像并采用两阶段渐进式训练协议来弥合训练数据的差距。

提出的方法

ProFact 使用两个级联分支：粗定位分支（CLB）和反馈增强分支（FEB），通过渐进反馈机制连接。
CLB 依赖 SegFormer（MiT 块）生成粗糙地图 Mc，并结合上下文空间金字塔模块（CSPM）来增强特征。
FEB 接收 Mc，结合 CLB 特征应用整体注意力模块（HAM）以细化表示，并预测最终地图 Mp。
上下文空间金字塔模块（CSPM）结合上下文变换器（CoT）块与多尺度扩张卷积金字塔，以丰富局部与上下文特征。
使用 MBH（Matting、Blending、Harmonization）生成训练数据，以产生大规模、真实感强的伪造图像，包括 MBH-COCO 和 MBH-RAISE 数据集。
两阶段训练协议：先在 MBH-COCO 上训练，然后在 MBH-RAISE 上进行较大输入尺寸的微调以提高泛化能力。

实验结果

研究问题

RQ1带反馈增强的 Transformer 是否能提高伪造区域定位的准确性，超越传统的编码器-解码器网络？
RQ2带中间特征细化的粗到细策略在多样化伪造类型和分辨率下如何影响检测鲁棒性？
RQ3真实伪造训练样本（MBH）是否提升跨数据集的泛化能力和伪造定位方法的鲁棒性？
RQ4在检测微妙的篡改痕迹方面，CSPM 等多尺度上下文特征的作用是什么？

主要发现

数据集	Noiseprint	ManTra-Net	DFCN	MVSS-Net	PSCC-Net	OSN	CAT-Net	ProFact	Average
Columbia	36.4 (7)	35.6 (8)	38.1 (6)	68.4 (4)	61.5 (5)	71.3 (3)	79.3 (2)	84.5 (1)	55.2 (1)
CASIAv1	12.9 (7)	13.0 (6)	8.3 (8)	45.1 (5)	46.3 (4)	50.9 (3)	71.0 (1)	56.4 (2)	54.7 (3)
NIST16	12.2 (6)	9.2 (7)	-	29.4 (4)	18.7 (5)	33.1 (2)	30.2 (3)	43.1 (1)	28.9 (6)
DSO-1	33.9 (6)	33.2 (7)	68.4 (1)	27.1 (8)	41.1 (5)	44.5 (4)	47.9 (2)	46.4 (3)	40.4 (7)
IMD	17.9 (5)	18.3 (4)	17.3 (6)	26.0 (3)	15.8 (7)	49.1 (2)	-	53.8 (1)	25.8 (5)
Korus	14.7 (4)	17.9 (3)	10.8 (5)	9.5 (7)	10.2 (6)	29.9 (2)	6.1 (8)	31.5 (1)	16.2 (6)
Coverage	14.7 (8)	27.5 (5)	-	44.5 (2)	44.4 (3)	26.0 (6)	28.9 (4)	51.1 (1)	25.0 (8)
In the Wild	16.7 (6)	15.6 (7)	-	-	10.8 (8)	50.5 (2)	34.1 (3)	64.5 (1)	25.6 (7)
AutoSplice	33.0 (7)	18.2 (8)	-	64.6 (3)	60.4 (4)	50.9 (5)	86.2 (1)	65.5 (2)	39.0 (5)
Average	21.4 (7)	20.9 (8)	31.2 (6)	34.8 (4)	34.3 (5)	45.1 (3)	48.0 (2)	55.2 (1)

ProFact 在九个数据集上实现了最佳平均定位性能，F1 超过次优方法 CAT-Net 7.2%，IoU 超过 5.6%。
该方法在各数据集上始终位于前两名，包括高分辨率和未见的 AutoSplice 数据，显示出强泛化能力。
使用 MBH 生成数据和更大测试尺寸的两阶段训练提高了对尺度和边界真实感的鲁棒性。
在如 DSO-1 等具有挑战性的数据集上，所提出的 ProFact 展现显著提升，接近前三名的表现。
定性结果显示，在反馈细化后，定位图 Mp 得到细化，误检减少。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。