QUICK REVIEW

[论文解读] A Neural Transducer

Navdeep Jaitly, David Sussillo|arXiv (Cornell University)|Nov 16, 2015

Neural Networks and Applications参考文献 26被引用 36

一句话总结

本文提出了神经转导器（Neural Transducer），一种序列到序列模型，通过同时依赖部分输入序列和先前生成的输出，实现增量式、在线预测。与标准序列到序列模型不同，它采用在块之间保持递归状态的转导器RNN，实时生成可变长度的输出块，从而在TIMIT数据集上实现了19.8%的音素错误率，接近当前最先进水平，且无需使用注意力机制。

ABSTRACT

Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequence-to-sequence models, the Neural Transducer computes the next-step distribution conditioned on the partially observed input sequence and the partially generated sequence. At each time step, the transducer can decide to emit zero to many output symbols. The data can be processed using an encoder and presented as input to the transducer. The discrete decision to emit a symbol at every time step makes it difficult to learn with conventional backpropagation. It is however possible to train the transducer by using a dynamic programming algorithm to generate target discrete decisions. Our experiments show that the Neural Transducer works well in settings where it is required to produce output predictions as data come in. We also find that the Neural Transducer performs well for long sequences even when attention mechanisms are not used.

研究动机与目标

为解决序列到序列模型在需要完整输入后才能生成输出的局限性，尤其在语音识别和在线翻译等实时应用中。
使模型能够随着输入数据的到达逐步生成输出，而无需重新处理整个序列。
开发一种训练方法，尽管推理过程中缺乏显式对齐，仍能以可微分方式处理离散输出决策。
证明该模型即使在不使用自注意力机制的情况下，也能在长序列任务上表现良好，特别是在采用分块递归时。

提出的方法

该模型采用双流架构：编码器处理输入块，转导器RNN根据编码器特征和自身递归隐藏状态生成输出符号。
在每个时间步，转导器决定是否输出零个或多个输出符号，从而实现在每个块内可变长度的输出生成。
模型采用动态规划算法在训练期间计算近似最优对齐，使反向传播能够通过离散决策。
递归状态在块之间保持，使转导器能够保留长距离依赖关系和跨输入段的上下文信息。
训练目标是在给定输入块的前提下，最大化预测输出序列的似然性，使用通过动态规划推导出的对齐近似值。
模型在TIMIT音素识别任务上进行评估，采用单向LSTM和分块处理，同时对块大小、深度和注意力机制进行了消融实验。

实验结果

研究问题

RQ1序列到序列模型能否在输入到达时逐步生成输出，而无需等待完整输入序列？
RQ2当标准反向传播不直接适用于离散输出决策时，如何有效训练这些决策？
RQ3在输入块之间保持递归状态是否相比非递归的块处理方式，能提升长序列任务上的性能？
RQ4在不使用自注意力机制的情况下，该模型能否在长序列任务（如音素识别）上实现具有竞争力的表现？
RQ5模型性能对块大小和网络深度的敏感程度如何？

主要发现

使用三层单向LSTM编码器和转导器，神经转导器在TIMIT测试集上实现了19.8%的音素错误率（PER），接近当前最先进水平。
当使用GMM-HMM系统生成的高质量对齐进行训练时，模型达到19.8%的PER，表明在充分监督下具有强大性能。
在块大小为15帧时，保持转导器递归状态可使PER从34.3%降低至20.6%，凸显上下文保持的重要性。
在不使用注意力机制时，模型在最优块大小W=8时表现良好，且当引入注意力机制后，性能对块大小的敏感度降低。
该模型在长序列上表现优异，并避免了

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。