QUICK REVIEW

[논문 리뷰] Watermarking Diffusion Model

Yugeng Liu, Zheng Li|arXiv (Cornell University)|2023. 05. 21.

Generative Adversarial Networks and Image Synthesis인용 수 9

한 줄 요약

이 논문은 트리거를 프롬프트에 주입하여 잠재 확산 모델(LDM)의 소유권을 삽입하고 검증하기 위한 두 가지 워터마킹 스킴 NaiveWM과 FixedWM을 소개하며, 모델의 유용성 및 워터마킹 견고성에 대한 평가를 수행한다.

ABSTRACT

The availability and accessibility of diffusion models (DMs) have significantly increased in recent years, making them a popular tool for analyzing and predicting the spread of information, behaviors, or phenomena through a population. Particularly, text-to-image diffusion models (e.g., DALLE 2 and Latent Diffusion Models (LDMs) have gained significant attention in recent years for their ability to generate high-quality images and perform various image synthesis tasks. Despite their widespread adoption in many fields, DMs are often susceptible to various intellectual property violations. These can include not only copyright infringement but also more subtle forms of misappropriation, such as unauthorized use or modification of the model. Therefore, DM owners must be aware of these potential risks and take appropriate steps to protect their models. In this work, we are the first to protect the intellectual property of DMs. We propose a simple but effective watermarking scheme that injects the watermark into the DMs and can be verified by the pre-defined prompts. In particular, we propose two different watermarking methods, namely NAIVEWM and FIXEDWM. The NAIVEWM method injects the watermark into the LDMs and activates it using a prompt containing the watermark. On the other hand, the FIXEDWM is considered more advanced and stealthy compared to the NAIVEWM, as it can only activate the watermark when using a prompt containing a trigger in a fixed position. We conducted a rigorous evaluation of both approaches, demonstrating their effectiveness in watermark injection and verification with minimal impact on the LDM's functionality.

연구 동기 및 목표

확산 모델의 지적 재산권 보호를 고취하고 워터마크가 LDM의 유용성을 저하하지 않는 문제를 다룬다.
프롬프트를 통해 워터마크를 주입하고 검증하는 두 가지 워터마킹 스킴(NaiveWM 및 FixedWM)을 제안한다.
다양한 공격 하에서 워터마크 견고성, 유용성 영향, 확장성을 평가하는 평가 프레임워크를 개발한다.
MS COCO 데이터셋을 이용하여 사전 학습된 잠재 확산 모델(LDM)의 워터마크를 실현 가능하고 은닉적으로 보이는지 시연한다.

제안 방법

Trigger 프롬프트로 프리트레인된 LDM을 미세조정하여 워터마크를 포함한 출력물을 생성하는 워터마크 주입.
NaiveWM은 프롬프트에 워터마크 트리거를 삽입하고 워터마크 이미지 쌍으로 LDM을 파인튜닝한다.
FixedWM은 프롬프트 내에서 고정된 트리거 위치로 워터마크 활성화를 제한하여 은닉성을 강화한다.
평가는 MS COCO 데이터셋과 다수의 이미지 품질 지표(FID, SSIM, PSNR, VIFp, FSIM)를 사용하여 유용성과 워터마크 품질을 평가한다.
워터마크 성능은 원본 이미지와 워터마크 이미지 간의 MSE와 트리거 길이 및 오염 비율에 따른 절삭(ablations)으로 측정된다.
확산 과정의 조건화 동안 트리거를 주입하고 탐지하기 위해 텍스트 인코더 토크나이제이션(예: BERT)에 의존한다.

실험 결과

연구 질문

RQ1RQ1: NaiveWM과 FixedWM이 워터마킹 후 LDM의 유용성을 보존하는가?
RQ2RQ2: NaiveWM과 FixedWM이 검증 시 워터마크 이미지를 신뢰성 있게 트리거할 수 있는가?
RQ3RQ3: 오염 비율(poisoning ratio)과 트리거 길이가 워터마크의 효과와 모델 유용성에 어떤 영향을 주는가?
RQ4RQ4: 워터마크 은닉성(st stealthiness)과 활성화 신뢰성 간의 trade-off는 무엇인가?

주요 결과

모델	FID ↓	SSIM ↑	PSNR ↑	VIFp ↑	FSIM ↑
Baseline	28.265	0.114 ± 0.084	32.604 ± 1.616	0.013 ± 0.009	0.289 ± 0.026
NaiveWM	29.456	0.110 ± 0.079	32.674 ± 1.635	0.014 ± 0.011	0.286 ± 0.024
FixedWM_clean	31.690	0.107 ± 0.078	32.623 ± 1.616	0.013 ± 0.009	0.286 ± 0.023
FixedWM_other	32.468	0.107 ± 0.079	32.656 ± 1.655	0.014 ± 0.010	0.285 ± 0.024

NaiveWM과 FixedWM 모두 기준 모델에 비해 modest한 저하로 모델 유용성을 유지한다.
NaiveWM과 FixedWM은 의도된 출력물과 근사한 워터마크 이미지를 생성할 수 있으며 워터마크 이미지 MSE가 낮다(약 0.118–0.121).
워터마킹은 유용성 손실을 제한적으로 야기하며 NaiveWM의 FID는 약간 증가하고 특정 설정에서 FixedWM 변형은 더 큰 증가를 보인다.
오염 비율과 트리거 길이를 늘리면 일반적으로 이미지 품질과 워터마크 탐지 가능성이 감소하며 워터마크 강도와 유용성 사이의 트레이드오프가 나타난다.
FixedWM의 트리거 위치 제약은 은닉성을 강화하지만 더 긴 트리거 길이에서 워터마크 효과를 감소시킬 수 있다.
실용적인 워터마킹 프레임워크가 다수의 지표(FID, SSIM, PSNR, VIFp, FSIM)에서 정량적 결과로 실현 가능성을 시연한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.