QUICK REVIEW

[Paper Review] Cooperative Training of Descriptor and Generator Networks

Jianwen Xie, Yang Lu|arXiv (Cornell University)|Sep 29, 2016

Generative Adversarial Networks and Image Synthesis30 citations

TL;DR

This paper proposes a cooperative training framework for deep energy-based descriptor networks and generative models using bottom-up and top-down convolutional neural networks. By interweaving MCMC sampling and contrastive divergence, the descriptor network teaches the generator via MCMC transitions, enabling the generator to learn realistic image synthesis without mode collapse, achieving state-of-the-art performance on dynamic texture synthesis with PSNR of 19.407 and SSIM of 0.5988.

ABSTRACT

This paper studies the cooperative training of two generative models for image modeling and synthesis. Both models are parametrized by convolutional neural networks (ConvNets). The first model is a deep energy-based model, whose energy function is defined by a bottom-up ConvNet, which maps the observed image to the energy. We call it the descriptor network. The second model is a generator network, which is a non-linear version of factor analysis. It is defined by a top-down ConvNet, which maps the latent factors to the observed image. The maximum likelihood learning algorithms of both models involve MCMC sampling such as Langevin dynamics. We observe that the two learning algorithms can be seamlessly interwoven into a cooperative learning algorithm that can train both models simultaneously. Specifically, within each iteration of the cooperative learning algorithm, the generator model generates initial synthesized examples to initialize a finite-step MCMC that samples and trains the energy-based descriptor model. After that, the generator model learns from how the MCMC changes its synthesized examples. That is, the descriptor model teaches the generator model by MCMC, so that the generator model accumulates the MCMC transitions and reproduces them by direct ancestral sampling. We call this scheme MCMC teaching. We show that the cooperative algorithm can learn highly realistic generative models.

Motivation & Objective

To develop a cooperative learning algorithm that jointly trains energy-based descriptor and latent variable generator networks for image modeling.
To overcome challenges in training deep generative models on highly variable image data when trained separately.
To provide an alternative to GANs that avoids mode collapse by enabling mutual knowledge distillation via MCMC.
To enable stable, likelihood-based training by interweaving MCMC sampling and gradient updates between the two models.
To generalize the framework to conditional generation tasks such as image synthesis from class labels, text, or sketches.

Proposed method

The descriptor network is a bottom-up ConvNet that computes image energy, forming an energy-based model.
The generator network is a top-down ConvNet that maps latent factors to images via ancestral sampling.
The cooperative training alternates between: (1) initializing MCMC for the descriptor from generator-synthesized examples, and (2) updating the generator based on MCMC transitions.
MCMC teaching enables the generator to learn and reproduce MCMC transitions, effectively distilling the descriptor's sampling dynamics.
Modified contrastive divergence is used to train the descriptor, with MCMC initialized from generator outputs rather than real data.
The framework interweaves maximum likelihood learning of both models, allowing mutual bootstrapping of MCMC sampling and gradient updates.

Experimental results

Research questions

RQ1Can cooperative training between a descriptor network and a generator network improve image synthesis quality compared to independent training?
RQ2How can MCMC sampling be used to teach a generator network to reproduce complex image structures?
RQ3Does the cooperative learning scheme avoid mode collapse, a common failure mode in GANs?
RQ4Can the descriptor network’s MCMC dynamics be effectively distilled into the generator for improved sample quality?
RQ5Can the cooperative framework generalize to conditional generation tasks such as text-to-image or sketch-to-image synthesis?

Key findings

The cooperative training algorithm successfully learns highly realistic generative models of images, including dynamic textures.
On dynamic texture synthesis, the model achieved a PSNR of 19.407 and SSIM of 0.5988, outperforming LDS (19.148, 0.5939), HOSVD (18.392, 0.4573), and other baselines.
The method avoids mode collapse, a common issue in GAN-based models, due to the stable likelihood-based training process.
The generator network learns to reproduce MCMC transitions by direct ancestral sampling, effectively distilling the descriptor’s sampling behavior.
The descriptor network learns from finite real data, while the generator learns from virtually infinite synthesized data, enabling robust generalization.
The framework generalizes to conditional generation, enabling tasks such as image generation from class labels, text descriptions, or sketches.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.