[Paper Review] Non-Monotonic Sequential Text Generation
The paper introduces a framework for training text generators that learn non-monotonic generation orders via a binary-tree based policy learned through imitation learning, achieving competitive performance with left-to-right models.
Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy's own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order, while achieving competitive performance with conventional left-to-right generation.
Motivation & Objective
- Motivate exploration of non-monotonic generation orders without external supervision.
- Develop a tree-based generation framework that can output sequences in arbitrary orders.
- Formulate learning as imitation learning with an oracle and coaching to guide policy learning.
- Demonstrate that non-monotonic generation can match or exceed left-to-right baselines on multiple tasks.
Proposed method
- Model a generation process as constructing a binary tree via level-order traversal and producing the final sequence via in-order traversal.
- Represent the policy as a neural network (LSTM or Transformer) that, given a partial tree, outputs a distribution over possible next tokens or end token.
- Frame learning as imitation learning with roll-in/roll-out using an oracle policy and KL-divergence based cost to align the learner with the oracle’s preferred actions.
- Introduce coaching and annealed coaching oracles to progressively bias the learner toward its own preferences while maintaining exploration.
- Allow conditioning on inputs X (e.g., for translation or image captioning) by encoding X and using it to initialize or modulate the policy state.
- Provide variants that separate end-token prediction from token prediction and optionally incorporate explicit tree-encoding for improvements.
Experimental results
Research questions
- RQ1 Can a text generator learn useful generation orders without pre-specified monotonic ordering?
- RQ2How effective are non-monotonic generation policies compared to traditional left-to-right models across language modeling, reordering, and translation tasks?
- RQ3What learning-to-search strategies (oracle definitions, roll-in/roll-out schemes) best facilitate training for non-monotonic sequence generation?
- RQ4Does annealing coaching improve exploration and final performance over uniform or pure coaching oracles?
- RQ5Can the framework be conditioned on auxiliary inputs (e.g., for conditional generation like translation) without hand-crafted supervision?
Key findings
- The framework enables learning generation policies that do not rely on a fixed order and can exhibit easy-first behavior.
- Policies trained with annealed coaching tend to produce more fluent and novel sentences and achieve Bleu-like quality closer to validation data than other non-monotonic settings.
- On word reordering, annealed and uniform policies can outperform left-to-right baselines on F1 and maintain competitive Bleu scores across validation and test sets.
- In machine translation, the non-monotonic policies achieve competitive metrics with left-to-right models, with annealed variants often approaching or surpassing baseline quality on several measures.
- The approach yields successful conditional generation (e.g., translation) using Transformer-based policies and end-token handling without standard autoregressive decoding constraints.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.