QUICK REVIEW

[论文解读] DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting

Hao Xiong, Ruiqing Zhang|arXiv (Cornell University)|Jul 30, 2019

Natural Language Processing Techniques参考文献 46被引用 25

一句话总结

DuTongChuan 提出了一种上下文感知的神经机器翻译模型，用于同步口译。该模型能从流式自动语音识别（ASR）输出中动态检测信息单元（IUs），对初始 IUs 采用部分解码，对后续 IUs 采用上下文感知解码，从而在低延迟与流畅、连贯的翻译之间取得平衡。该模型在中文-英文和英文-中文翻译中的人工评估得分分别达到 85.71% 和 86.36%，在大多数情况下延迟低于 3 秒。

ABSTRACT

In this paper, we present DuTongChuan, a novel context-aware translation model for simultaneous interpreting. This model allows to constantly read streaming text from the Automatic Speech Recognition (ASR) model and simultaneously determine the boundaries of Information Units (IUs) one after another. The detected IU is then translated into a fluent translation with two simple yet effective decoding strategies: partial decoding and context-aware decoding. In practice, by controlling the granularity of IUs and the size of the context, we can get a good trade-off between latency and translation quality easily. Elaborate evaluation from human translators reveals that our system achieves promising translation quality (85.71% for Chinese-English, and 86.36% for English-Chinese), specially in the sense of surprisingly good discourse coherence. According to an End-to-End (speech-to-speech simultaneous interpreting) evaluation, this model presents impressive performance in reducing latency (to less than 3 seconds at most times). Furthermore, we successfully deploy this model in a variety of Baidu's products which have hundreds of millions of users, and we release it as a service in our AI platform.

研究动机与目标

为解决同步口译系统中低延迟与高翻译质量之间的平衡挑战。
通过建模超越单个话语的上下文依赖关系，提升流式翻译中的语篇连贯性。
实现低延迟与高流畅性，推动同步翻译在实际应用中的部署。
开发一种模仿人类口译策略（如‘分块法’或‘薄片法’）的系统，以提升翻译连贯性。

提出的方法

该模型使用一种新颖的信息单元（IU）边界检测器，实时从流式 ASR 输出中识别有意义的语言片段。
对句首的 IUs 应用部分解码，以最小化延迟并实现早期翻译。
对句中或句末的 IUs 采用上下文感知解码，利用历史上下文提升流畅性与连贯性。
系统动态控制 IU 的粒度与上下文窗口大小，以在延迟与翻译质量之间进行权衡。
该架构将 ASR 流式输入与双路径神经机器翻译（NMT）解码器相结合，实现在不等待句末边界的情况下持续翻译。
该模型在大规模语音到文本翻译语料库上进行端到端训练，并使用人工标注的同步口译数据进行微调。

实验结果

研究问题

RQ1在实时处理过程中，同步翻译模型如何在保持高流畅性与连贯性的同时最小化延迟？
RQ2将输入语音最优分割为有意义的信息单元（IUs）的策略是什么？
RQ3与标准的部分解码相比，上下文感知解码是否能显著提升流式环境下的翻译质量？
RQ4与现有的 wait-k 和全句基线模型相比，该模型在延迟与人工评分质量方面的表现如何？
RQ5人类启发的分块策略在多大程度上能提升机器翻译中的语篇层面连贯性？

主要发现

该模型在中文-英文和英文-中文同步翻译中的人工评估得分分别达到 85.71% 和 86.36%，表明其具有出色的流畅性与连贯性。
与标准部分解码模型相比，语篇连贯性得到显著提升，该结论得到人工评估的证实。
在端到端语音到语音翻译中，延迟在大多数情况下保持在 3 秒以内，满足实时性要求。
通过上下文感知解码，系统对 ASR 错误表现出强鲁棒性，提升了翻译的可靠性。
该模型已成功部署于百度 AI 平台，服务数亿用户，证实了其在真实场景中的可扩展性。
作者发布了新的语音翻译语料库 BSTC，以支持未来对鲁棒性同步翻译的研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。