QUICK REVIEW

[Paper Review] Professor Forcing: A New Algorithm for Training Recurrent Networks

Alex Lamb, Anirudh Goyal|arXiv (Cornell University)|Oct 27, 2016

Topic Modeling33 references330 citations

TL;DR

Professor Forcing introduces an adversarial training framework that aligns the generative (sampling) dynamics of an RNN with its teacher-forced dynamics, improving long-term sequence generation and acting as a regularizer.

ABSTRACT

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.

Motivation & Objective

Motivate improving long-term sequence generation beyond training sequences.
Introduce a method to make training-time and sampling-time dynamics indistinguishable for RNNs.
Show that matching dynamics acts as a regularizer and improves generalization across tasks.

Proposed method

Propose Professor Forcing by pairing a generator RNN with a discriminator in a GAN-like setup to distinguish teacher-forcing vs free-running behavior.
Define behavior sequences B(x,y,θg) from open-loop (teacher forcing) and closed-loop (free-running) modes.
Train the discriminator to distinguish these behaviors and train the generator to both fit data (NLL) and fool the discriminator (C_f, C_t).
Use a bidirectional RNN discriminator to evaluate the full behavior sequence.
Update rules include NLL + C_f (and optionally C_t) for the generator and C_d for the discriminator.
Apply to character-level language modeling, sequential MNIST, handwriting, and vocal synthesis on raw waveforms.

Experimental results

Research questions

RQ1Can adversarially aligning training-time and sampling-time dynamics improve long-term sequence generation?
RQ2Does Professor Forcing regularize recurrent models and improve test likelihood across domains?
RQ3How does Professor Forcing impact the quality and diversity of samples compared to teacher forcing?
RQ4In what tasks does long-term dependency modeling benefit most from dynamics matching?
RQ5What are practical considerations when training with Professor Forcing (discriminator balance, training time)?

Key findings

Professor Forcing reduces divergence between training-time and sampling-time hidden-state dynamics, as shown by T-SNE visualizations.
On character-level Penn Treebank, Professor Forcing improves validation bits-per-character from 1.50 to 1.48.
Professor Forcing acts as a regularizer, improving test likelihood on Sequential MNIST and speech synthesis tasks.
In handwriting generation, human evaluators favored Professor Forcing samples over Teacher Forcing samples.
On sequential MNIST, Professor Forcing achieves competitive MNLL (79.58) compared to PixelRNN (79.2) in objective evaluations.
Professor Forcing requires additional training time due to discriminator phase but can accelerate convergence and improve sample quality.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.