QUICK REVIEW

[论文解读] RNN Approaches to Text Normalization: A Challenge

Richard Sproat, Navdeep Jaitly|arXiv (Cornell University)|Oct 31, 2016

Speech Recognition and Synthesis参考文献 1被引用 55

一句话总结

本文挑战自然语言处理社区，要求基于新发布的书面文本与其标准化口语形式对齐的数据集，开发基于RNN的文本归一化模型。尽管整体准确率较高，RNN在实际部署中仍会产生关键性错误；将RNN与简单FST过滤器结合可显著提升可靠性，表明仅靠RNN不足以实现鲁棒的文本归一化。

ABSTRACT

This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form, train an RNN to learn the correct normalization function. We present a data set of general text where the normalizations were generated using an existing text normalization component of a text-to-speech system. This data set will be released open-source in the near future. We also present our own experiments with this data set with a variety of different RNN architectures. While some of the architectures do in fact produce very good results when measured in terms of overall accuracy, the errors that are produced are problematic, since they would convey completely the wrong message if such a system were deployed in a speech application. On the other hand, we show that a simple FST-based filter can mitigate those errors, and achieve a level of accuracy not achievable by the RNN alone. Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model. And when we open-source our data, we will be providing a novel data set for sequence-to-sequence modeling in the hopes that the the community can find better solutions. The data used in this work have been released and are available at: https://github.com/rwsproat/text-normalization-data

研究动机与目标

解决从大规模对齐文本数据中训练RNN学习文本归一化的挑战。
探究RNN是否能够可靠地学习从书面形式到口语形式的复杂映射关系。
评估纯RNN方法在生成语音应用中无错误归一化方面的局限性。
证明结合RNN与有限状态转换器（FST）的混合系统可纠正RNN错误并提升准确率。
发布一个新颖的开源数据集，以推动序列到序列建模在文本归一化研究中的发展。

提出的方法

利用现有文本到语音系统中的归一化组件，为大规模书面文本语料生成标准化口语形式。
在将书面文本映射到标准化形式的序列到序列任务上，训练多种RNN架构（如LSTM、GRU）。
应用基于有限状态转换器（FST）的过滤器，通过语言学规则纠正RNN生成的预测结果，修复系统性错误。
使用标准指标（如词错误率WER）及对失败案例的错误分析来评估模型性能。
对比端到端RNN与RNN+FST混合系统的性能，以评估错误缓解效果。
将数据集公开发布，以支持未来在序列建模与文本归一化方面的研究。

实验结果

研究问题

RQ1RNN能否在大规模、多样化的书面到标准化口语文本语料上实现高精度的文本归一化？
RQ2RNN在文本归一化中会产生哪些类型的错误，这些错误如何影响实际语音应用？
RQ3简单FST过滤器在多大程度上能纠正RNN生成的文本归一化错误？
RQ4RNN的性能是否足以满足生产环境中语音系统的部署需求？
RQ5所提出的开源数据集能否通过社区研究推动更优归一化模型的开发？

主要发现

尽管RNN在文本归一化任务上取得了高整体准确率，但其仍会产生关键性错误，这些错误会误导语音应用。
RNN模型经常错误发音或错误表示数字、缩写和首字母缩略词，导致语义错误的输出。
简单的FST过滤器能有效纠正RNN中最严重的问题错误，显著提升系统可靠性。
混合RNN+FST系统在准确率上优于纯RNN模型，表明基于规则的后处理至关重要。
本研究结论认为，即使拥有大规模标注数据集，纯RNN方法仍不足以实现鲁棒的文本归一化。
作者发布了新的开源数据集，以支持未来在序列到序列建模方面的文本归一化研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。