[Paper Review] Extracting Training Data from Diffusion Models
The paper demonstrates that state-of-the-art diffusion models memorize and can regurgitate individual training images, and presents attacks to extract memorized data across Stable Diffusion, Imagen, and CIFAR-10–trained models.
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.
Motivation & Objective
- Define memorization and extractability in image diffusion models.
- Show that diffusion models memorize training images and can regenerate near-copies.
- Analyze how model size, data, augmentation, and deduplication affect memorization.
- Evaluate privacy-preserving techniques and identify privacy-utility tradeoffs.
Proposed method
- Adapt and define $(\ell,\delta)$-extraction and $(k,\ell,\delta)$-eidetic memorization for diffusion models.
- Perform two-stage generate-and-filter attacks to extract memorized training images from diffusion models.
- Use CLIP-based embeddings to identify near-duplicate training images and construct a clique-based Memorization detector.
- Train multiple diffusion models on CIFAR-10 to study the impact of accuracy, hyperparameters, augmentation, and deduplication on privacy.
- Apply black-box and white-box membership inference attacks to assess privacy leakage.
Experimental results
Research questions
- RQ1Can diffusion models memorize and regurgitate training images?
- RQ2How does memorization depend on model size, training data, and training practices?
- RQ3What are effective practical attacks to extract memorized data from diffusion models?
- RQ4Do existing privacy-enhancing techniques provide acceptable privacy-utility tradeoffs for diffusion models?
- RQ5How do diffusion models compare to GANs in memorization-related privacy risks?
Key findings
- Diffusion models memorized and regenerated training images from Stable Diffusion and Imagen, with near-identical replicas identified.
- Extraction yielded over 100 memorized training examples across target models, including personal photos and logos, with many images lacking permissive licenses.
- Memorization rates correlate with data duplication; higher duplication leads to higher extraction rates, e.g., 93–109 memorized images identified under defined criteria.
- Imagen shows higher memorization risk than Stable Diffusion, particularly with larger capacity and training iterations.
- CIFAR-10 experiments reveal substantial memorization in smaller controlled diffusion models, with 2,500–1,280 extracted images depending on method; some memorized examples exist even in low-duplication settings.
- Traditional privacy tools (e.g., existing differential privacy-related techniques) do not yield favorable privacy-utility tradeoffs for diffusion models.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.