QUICK REVIEW

[论文解读] SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

Xinjie Zhu, Zijing Zhao|arXiv (Cornell University)|Mar 3, 2026

Advanced Steganography and Watermarking Techniques被引用 0

一句话总结

SIGMark 提出一种盲式、可扩展的在生成水印方案，针对视频扩散模型，使用全局逐帧伪随机编码（GF-PRC）和段组排序（SGO）模块，实现常量时间提取和鲁棒的时序处理。

ABSTRACT

Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.

研究动机与目标

为 AI 生成视频提供保护、版权识别和内容溯源的动机。
解决现有在生成水印在可扩展性和时序鲁棒性方面的不足。
开发一种盲提取水印框架，在确保视频质量的同时实现大规模提取的常量时间。
对现代视频扩散模型进行实用评估，以展示鲁棒性和可扩展性。

提出的方法

通过全局逐帧伪随机编码（GF-PRC）方案将水印嵌入初始潜在噪声中，确保噪声保持高斯分布且未被扭曲，从而实现高质量生成。
为每个潜在帧组分配一个全局 PRC 密钥，以实现盲提取且无需存储每个视频的元数据。
引入段组排序（SGO）模块，利用光流分割和滑动窗口检测，在时序干扰下恢复正确的因果帧分组。
将带水印的视频反演回潜在空间并使用 PRC 密钥解码消息，实现对篡改视频的盲提取。
在生成视频数量增加时保持提取成本恒定，体现可扩展性。

实验结果

研究问题

RQ1在视频扩散模型的在生成水印中，是否可以在不维护每个视频水印参考的情况下实现盲提取？
RQ2如何在反演过程中减轻时序干扰（如帧丢失、裁剪）以保留水印完整性？
RQ3GF-PRC 在大规模使用下能否实现无失真嵌入，同时保持扩散模型的生成质量？
RQ4相较于现有方法，SIGMark 在时空扰动下的鲁棒性如何？

主要发现

模型	水印化	Bit acc (T2V)	V-score (T2V)	Bit acc (I2V)	V-score (I2V)	Bit acc (Overall)	V-score (Overall)
HunyuanVideo	无水印	–	–	–	–	–	–
HunyuanVideo	DCT（后处理）	0.889	0.424	0.862	0.423	0.890	0.452
HunyuanVideo	DT-CWT（后处理）	0.619	0.416	0.650	0.436	0.627	0.458
HunyuanVideo	VideoMark（非盲）	0.873	0.507	0.758	0.502	0.846	0.483
HunyuanVideo	VideoShield（非盲）	1.000	0.497	0.991	0.506	1.000	0.482
HunyuanVideo	SIGMark（我们）(盲）	0.958	0.506	0.885	0.499	0.981	0.472

SIGMark 在低/高容量设置下均能实现高比特准确率，并且在与非盲基线相比具有优势，同时与盲基线保持竞争力。
在干扰下，SIGMark 仍保持较强的比特准确率（例如在一个设置中为 0.958 Bit Acc，V-score 为 0.506），并呈现比以往方法更好的时序鲁棒性，后者会因为帧分组错误而受损。
GF-PRC 实现盲提取且成本恒定，与提取成本随生成视频数量增长的做法形成对比。
SGO 能在时序干扰下有效恢复因果帧分组，提升提取可靠性。
在现代扩散模型（HunyuanVideo 和 Wan-2.2）上的实验表明，SIGMark 在提取准确度保持高水平的同时开销很小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。