QUICK REVIEW

[论文解读] RealStats: A Rigorous Real-Only Statistical Framework for Fake Image Detection

Haim Zisman, Uri Shaham|arXiv (Cornell University)|Jan 26, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

RealStats 提供一个训练无关、统计牢靠的假图像检测框架，通过聚合多个仅真实的 p 值来测试与真实图像分布的一致性，从而提供可校准、可解释的输出。

ABSTRACT

As generative models continue to evolve, detecting AI-generated images remains a critical challenge. While effective detection methods exist, they often lack formal interpretability and may rely on implicit assumptions about fake content, potentially limiting robustness to distributional shifts. In this work, we introduce a rigorous, statistically grounded framework for fake image detection that focuses on producing a probability score interpretable with respect to the real-image population. Our method leverages the strengths of multiple existing detectors by combining training-free statistics. We compute p-values over a range of test statistics and aggregate them using classical statistical ensembling to assess alignment with the unified real-image distribution. This framework is generic, flexible, and training-free, making it well-suited for robust fake image detection across diverse and evolving settings.

研究动机与目标

在不断进化的生成模型下，证明可解释且可扩展的假图像检测的必要性。
开发一个基于真实图像分布的统计假设检验的训练无关框架。
通过独立性感知的聚合，将多种统计量整合，产生经过校准的 p 值。
确保可扩展性、对分布漂移的鲁棒性，以及为引入新统计量提供模块化结构。

提出的方法

使用冻结特征提取器从真实图像中提取多样化的标量统计量。
将每个统计量映射到通过真实图像估计的经验分布函数得到的双边 p 值。
通过构建独立性图并在均匀性约束下提取最大团来选择独立的统计量子集。
使用如 Stouffer’s 组合方法或 min-p 等对选定的 p 值进行聚合，得到在原假设下的统一 p 值。
推断仅使用所选统计量来计算 p 值并在选定的显著性水平上作出决策。

Figure 1: Illustration of the score interpretability gap between a supervised classifier Wang et al. ( 2020 ) and our statistical method. Top: A supervised model outputs scores that can separate real from fake images, but these scores are not inherently interpretable, as they lack clear statistical

实验结果

研究问题

RQ1一个基于真实图像、训练无关的框架是否能够产生经过校准的 p 值，从而有意义地量化真实与伪图像的概率？
RQ2聚合多个独立的仅真实统计量是否能够提高对随生成器演进的分布漂移的鲁棒性？
RQ3RealStats 如何在可解释性与相较于训练无关基线的检测性能之间取得平衡？
RQ4在不重训练的前提下，框架是否能够通过加入新统计量来提升对具有挑战性的生成器的性能？

主要发现

模型	AUC	AP
Manifold Bias	0.761 ± 0.179	0.753 ± 0.169
RIGID	0.769 ± 0.194	0.765 ± 0.189
AEROBLADE	0.697 ± 0.161	0.697 ± 0.163
Ours (Stouffer)	0.756 ± 0.135	0.743 ± 0.133
Ours (Min-p)	0.775 ± 0.126	0.756 ± 0.119

该方法在与最先进的训练无关检测器相比下具有竞争力的 AUC 与 AP（例如 Min-p 集成的 AUC 为 0.775，AP 为 0.756），并且在不同生成器之间的方差较低。
逐生成器分析显示，其性能更为平衡且鲁棒性优于部分基线，且在引入多样化统计量后有提升（例如将 ManifoldBias 加入 Min-p 提升 GauGAN、CycleGAN 和 SAN 的 AUC）。
框架通过对每次推断返回经过校准的 p 值，提供可解释的输出，使在标准显著性水平下的决策具有原则性。
该方法速度快、可扩展、内存利用高效，与前向传播相比，独立性测试的额外开销可忽略不计。
该方法对常见损坏具有鲁棒性（如高斯模糊；JPEG 压缩造成中等下滑），在参考分布错配下仍具备对判别信号的可适应性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。