Skip to main content
QUICK REVIEW

[Paper Review] Learning to Decode for Future Success

Jiwei Li, Will Monroe|arXiv (Cornell University)|Jan 23, 2017
Topic Modeling35 references50 citations
TL;DR

Introduces a simple decoding strategy that combines an MLE-based policy with a future-outcome predictor to steer generation toward desired properties such as sequence length, mutual information, and BLEU/ROUGE scores, improving performance across translation, summarization, and dialogue tasks.

ABSTRACT

We introduce a simple, general strategy to manipulate the behavior of a neural decoder that enables it to generate outputs that have specific properties of interest (e.g., sequences of a pre-specified length). The model can be thought of as a simple version of the actor-critic model that uses an interpolation of the actor (the MLE-based token generation policy) and the critic (a value function that estimates the future values of the desired property) for decision making. We demonstrate that the approach is able to incorporate a variety of properties that cannot be handled by standard neural sequence decoders, such as sequence length and backward probability (probability of sources given targets), in addition to yielding consistent improvements in abstractive summarization and machine translation when the property to be optimized is BLEU or ROUGE scores.

Motivation & Objective

  • Motivate the need for controllable neural sequence generation beyond standard MLE decoding.
  • Propose a simple actor-critic-inspired decoding strategy that interpolates an MLE policy with a future-value predictor.
  • Demonstrate that the approach can control properties such as sequence length, mutual information, and BLEU/ROUGE scores across tasks.
  • Show empirical improvements over standard beam search and some RL-based baselines in translation, summarization, and conversation.
  • Discuss design variants and practical considerations for training and decoding with the future-predictor.

Proposed method

  • Define a value-function Q that estimates the future outcome of choosing a token during decoding.
  • Score for a next token is S(y_t)=log p(y_t|h_{t-1}) + gamma * Q(X, y_{1:t}).
  • Train Q to predict the final future outcome q(Y) (e.g., BLEU/ROUGE, length, mutual information) from (X, y_{1:t}).
  • Use a linear interpolation between the local MLE score and the predicted future outcome to guide decoding (controlled by lambda).
  • Offer variants for how Q is trained, including predicting remaining length, predicting backward probability p(X|Y) for MI, or predicting BLEU/ROUGE directly.
  • Apply decoding with beam search augmented by Q to encourage long-horizon goals without full policy updates.

Experimental results

Research questions

  • RQ1How can decoding be guided to produce outputs with specific properties (e.g., fixed length, higher mutual information, higher BLEU/ROUGE) without full RL training?
  • RQ2Does a simple interpolated actor-critic style decoding improve quality and diversity over standard beam search and over RL-based decoders across translation, summarization, and dialogue tasks?
  • RQ3What are effective ways to train and integrate the Q predictor for different properties (length, MI, BLEU/ROUGE) in practice?

Key findings

  • The proposed Q-augmented decoding yields improvements over standard beam search in multiple generation tasks.
  • For length control in dialogue, the approach reduces short-sequence bias and produces more coherent outputs than standard beam search; larger lambda increases diversity but may raise irrelevance if too large.
  • For mutual information, the future-prediction approach can outperform post-hoc MMI reranking, especially for longer targets, by maintaining diverse hypotheses earlier in decoding.
  • When optimizing BLEU/ROUGE, the future outcome function helps align training and test-time objectives and yields measurable improvements over baseline SEQ2SEQ with beam search.
  • Across tasks, the method provides consistent gains and offers a simple, general way to tailor decoders to desired properties without extensive RL training.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.