Skip to main content
QUICK REVIEW

[Paper Review] Understanding Hallucinations in Diffusion Models through Mode Interpolation

Sumukh K Aithal, Pratyush Maini|arXiv (Cornell University)|Jun 13, 2024
Mental Health Research Topics5 citations
TL;DR

The paper identifies a failure mode in diffusion models called mode interpolation, where models generate samples between nearby data modes, producing hallucinations outside the training support, and proposes a variance-based metric to detect and prune such samples during generation and recursive training.

ABSTRACT

Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution; this phenomenon leads diffusion models to generate artifacts that never existed in real data (i.e., hallucinations). We systematically study the reasons for, and the manifestation of this phenomenon. Through experiments on 1D and 2D Gaussians, we show how a discontinuous loss landscape in the diffusion model's decoder leads to a region where any smooth approximation will cause such hallucinations. Through experiments on artificial datasets with various shapes, we show how hallucination leads to the generation of combinations of shapes that never existed. Finally, we show that diffusion models in fact know when they go out of support and hallucinate. This is captured by the high variance in the trajectory of the generated sample towards the final few backward sampling process. Using a simple metric to capture this variance, we can remove over 95% of hallucinations at generation time while retaining 96% of in-support samples. We conclude our exploration by showing the implications of such hallucination (and its removal) on the collapse (and stabilization) of recursive training on synthetic data with experiments on MNIST and 2D Gaussians dataset. We release our code at https://github.com/locuslab/diffusion-model-hallucination.

Motivation & Objective

  • Formalize and characterize hallucinations in diffusion models as mode interpolation between nearby data modes.
  • Analyze how the learned score function smooths discontinuities, leading to interpolated, out-of-support samples.
  • Propose a metric based on trajectory variance to detect and filter hallucinations at generation time.
  • Explore implications for recursive training and demonstrate mitigation through pre-emptive filtering on synthetic and MNIST data.

Proposed method

  • Study 1D and 2D Gaussian mixtures to show diffusion models interpolate between nearby modes.
  • Show that the neural network learns a smooth approximation of the true score function, causing interpolation in regions between disjoint modes.
  • Identify high-variance trajectories of x0 predictions during the final diffusion steps as hallmarks of hallucinations.
  • Define a hallucination metric Hal(x) based on the variance of predicted x0 across time steps to classify samples.
  • Evaluate the metric’s filtering capability: remove ~95–96% of hallucinations while retaining ~95–98% of in-support samples.
Figure 1 : Hallucinations in Diffusion Models : Original Dataset (Left) & Generated Dataset (Right). The original dataset consists of 64x64 images divided into three columns, each containing a triangle, square, or pentagon with a 0.5 probability of the shape being present. Each shape appears at most
Figure 1 : Hallucinations in Diffusion Models : Original Dataset (Left) & Generated Dataset (Right). The original dataset consists of 64x64 images divided into three columns, each containing a triangle, square, or pentagon with a 0.5 probability of the shape being present. Each shape appears at most

Experimental results

Research questions

  • RQ1What causes diffusion models to generate samples that lie outside the training support (hallucinations)?
  • RQ2Do diffusion models exhibit mode interpolation between nearby data modes, and how does the score function contribute?
  • RQ3Can a trajectory-variance-based metric detect and filter hallucinations without severely harming in-support samples?
  • RQ4What are the implications of hallucinations for recursive training and model stability?

Key findings

  • Diffusion models interpolate between nearby modes in synthetic 1D and 2D Gaussian mixtures, creating samples outside the training support.
  • A smooth learned score function, instead of sharp mode jumps, drives interpolation between disjoint modes.
  • High variance in the predicted x0 trajectory toward the end of reverse diffusion correlates with hallucinations and enables detection.
  • The Hal(x) metric can remove about 95–96% of hallucinations while preserving around 95–98% of in-support samples across setups.
  • Pre-emptive filtering based on the metric mitigates model collapse during recursive training on 2D Gaussians, Simple Shapes, and MNIST datasets.
Figure 2 : Mode Interpolation in 1D Gaussian . The red curve indicates the PDF of the true data distribution $q(x)$ , which is a mixture of 3 Gaussians (notice that the y-axis is in log-scale). In blue, we show a density histogram of the samples generated by a DDPM trained on varying number of sampl
Figure 2 : Mode Interpolation in 1D Gaussian . The red curve indicates the PDF of the true data distribution $q(x)$ , which is a mixture of 3 Gaussians (notice that the y-axis is in log-scale). In blue, we show a density histogram of the samples generated by a DDPM trained on varying number of sampl

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.