[Paper Review] How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN
This paper presents a blind-watermark IPP framework for DNNs that embeds indistinguishable key samples into models to prove ownership, defend against evasion, and resist fraudulent claims, with strong empirical results across datasets and architectures.
Deep learning techniques have made tremendous progress in a variety of challenging tasks, such as image recognition and machine translation, during the past decade. Training deep neural networks is computationally expensive and requires both human and intellectual resources. Therefore, it is necessary to protect the intellectual property of the model and externally verify the ownership of the model. However, previous studies either fail to defend against the evasion attack or have not explicitly dealt with fraudulent claims of ownership by adversaries. Furthermore, they can not establish a clear association between the model and the creator's identity. To fill these gaps, in this paper, we propose a novel intellectual property protection (IPP) framework based on blind-watermark for watermarking deep neural networks that meet the requirements of security and feasibility. Our framework accepts ordinary samples and the exclusive logo as inputs, outputting newly generated samples as watermarks, which are almost indistinguishable from the origin, and infuses these watermarks into DNN models by assigning specific labels, leaving the backdoor as the basis for our copyright claim. We evaluated our IPP framework on two benchmark datasets and 15 popular deep learning models. The results show that our framework successfully verifies the ownership of all the models without a noticeable impact on their primary task. Most importantly, we are the first to successfully design and implement a blind-watermark based framework, which can achieve state-of-art performances on undetectability against evasion attack and unforgeability against fraudulent claims of ownership. Further, our framework shows remarkable robustness and establishes a clear association between the model and the author's identity.
Motivation & Objective
- Motivate the need to protect DNN intellectual property and address limitations of prior watermarking methods.
- Propose a blind-watermark based IPP framework that links a model to its creator’s identity.
- Demonstrate feasibility and practicality through a prototype and empirical evaluation on multiple datasets and architectures.
- Evaluate robustness against evasion attacks and fraudulent ownership claims.
- Show that watermarking induces minimal impact on primary model performance while enabling reliable ownership verification.
Proposed method
- Embed watermarks by generating key samples x^key = G(e, x, l) where e is a lightweight encoder and l is an exclusive logo; train an autoencoder-like setup with a discriminator to align the key-sample distribution P_e with the data distribution P_data.
- Use an adversarial/discriminator objective to minimize KL divergence between P_data and P_e, incorporating SSIM-based reconstruction loss to preserve sample indistinguishability.
- Backdoor the host DNN so that x^key is mapped to a predefined label t^key, enabling ownership verification via high accuracy on key samples.
- Provide a verification procedure where the owner queries a remote model with key samples and checks if acc_g(x^key, t^key) exceeds a threshold.
- Detail a joint objective O_e that combines reconstruction fidelity, SSIM, and adversarial losses, guiding encoder, discriminator, and host model during training.
- Outline the overall pipeline involving encoder, discriminator, and host DNN, and present the training protocol and hyper-parameter settings.
Experimental results
Research questions
- RQ1Can a blind-watermark IPP framework reliably prove ownership of DNNs while keeping fidelity to the original task?
- RQ2Does the proposed framework resist evasion attacks and fraudulent ownership claims better than prior watermarking methods?
- RQ3Is the watermark distribution close to the original data distribution, ensuring imperceptibility and robustness?
- RQ4Can the framework establish a clear association between the model and the creator’s identity under practical scenarios?
- RQ5What is the impact of watermark embedding on model accuracy across multiple architectures and datasets?
Key findings
- Watermarked models maintain similar accuracy to unwatermarked ones, with a fidelity drop averaging 0.66% and as low as 0.14%.
- Key samples achieve high verification accuracy, with watermarked models reaching over 90% accuracy on key samples and sometimes 100%.
- The blind-watermark approach achieves undetectability against evasion attacks, with detectors performing no better than random (AUC ~0.5 to 0.65 in extended tests).
- The framework demonstrates robustness against fraudulent ownership claims by making it difficult to forge valid key samples under reasonable assumptions.
- Experiments on MNIST and CIFAR-10 across 15 host DNNs show successful ownership verification with limited impact on the primary task.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.