QUICK REVIEW

[논문 리뷰] SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

Xinjie Zhu, Zijing Zhao|arXiv (Cornell University)|2026. 03. 03.

Advanced Steganography and Watermarking Techniques인용 수 0

한 줄 요약

SIGMark는 Global Frame-wise PseudoRandom Coding(GF-PRC)와 Segment Group-Ordering(SGO) 모듈을 사용하여 비디오 확산 모델의 생성 시 워터마킹을 블라인드하고 확장 가능하게 만드는 프레임워크를 제안하며, 상수 시간 추출과 강건한 시간적 처리 능력을 가능하게 한다.

ABSTRACT

Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.

연구 동기 및 목표

AI 생성 비디오에 대한 보호, 저작권 식별 및 콘텐츠 추적의 필요성 제시.
기존의 생성 중 워터마크의 확장성 및 시간적 강인성 한계 해결.
비디오 품질을 보존하고 대규모 추출에서 상수 시간 추출을 가능하게 하는 블라인드 추출 워터마킹 프레임워크 개발.
현대 비디오 확산 모델에 대한 실용적 평가를 통해 강인성과 확장성 입증.

제안 방법

Global Frame-wise PseudoRandom Coding(GF-PRC) 체계를 통해 초기 잠재 노이즈에 워터마킹을 삽입하되 노이즈가 Gaussian하게 유지되고 고품질 생성에 왜곡되지 않도록 보장.
글로벌 PRC 키를 잠재 프레임 그룹당 할당하여 비디오별 메타데이터 저장 없이 블라인드 추출 가능.
Segment Group-Ordering(SGO) 모듈 도입으로 광학 흐름 세분화 및 슬라이딩 윈도 탐지를 활용하여 시간적 간섭 하에서도 올바른 인과 프레임 그룹을 복구.
워터마킹된 비디오를 잠재 공간으로 역변환하고 PRC 키로 메시지를 디코딩하여 변조된 비디오에서도 블라인드 추출 가능.
생성 비디오 수와 무관하게 일정한 추출 비용 유지, 확장성 시연.

실험 결과

연구 질문

RQ1비디오 확산 모델의 생성 중 워터마킹에서 비디오별 워터마크 참조를 유지하지 않아도 블라인드 추출이 가능한가?
RQ2프레임 손실, 클리핑 등의 시간적 간섭을 어떻게 완화하여 역변환 시 워터마크 무결성을 보존할 수 있는가?
RQ3GF-PRC가 대규모 사용하에서도 왜곡 없는 임베딩과 확산 모델 품질 보존을 가능하게 하는가?
RQ4SIGMark의 공간적 및 시간적 교란에 대한 강건성이 기존 방법과 비교하여 어떤가?

주요 결과

SIGMark는 저용량/고용량 설정에서 높은 비트 정확도를 달성하고 비블라인드 베이스라인보다 우수하며 블라인드 베이스라인과도 경쟁력을 유지.
교란 하에서도 SIGMark는 강한 비트 정확도(예: 0.958 Bit Acc, 0.506 V-score의 한 설정) 유지하며 프레임 그룹 오류로 인한 문제를 가진 이전 방법보다 시간적 강건성이 더 좋다.
GF-PRC는 추출 비용이 상수로 유지되도록 블라인드 추출 가능.
SGO는 시간적 간섭 하에서 인과 프레임 그룹 복구를 효과적으로 수행하여 추출 신뢰도 향상.
현대 확산 모델(HunyuanVideo 및 Wan-2.2)에서 SIGMark가 추출 정확도를 높은 채로 오버헤드를 최소화하며 유지.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.