QUICK REVIEW

[论文解读] Exploring Human-Like Translation Strategy with Large Language Models

Zhiwei He, Tian Liang|arXiv (Cornell University)|May 6, 2023

Natural Language Processing Techniques被引用 16

一句话总结

MAPS 通过从源文本中挖掘关键词、主题和示范，促使大语言模型模仿人类翻译，整合它们以生成多种翻译，并使用质量估计来选择最佳输出。自动评估和人工评估表明 MAPS 提升翻译质量并降低诸如幻觉、歧义等错误。

ABSTRACT

Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. Specifically, we enable LLMs first to analyze the given source sentence and induce three aspects of translation-related knowledge: keywords, topics, and relevant demonstrations to guide the final translation process. Moreover, we employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge. Both automatic (3 LLMs x 11 directions x 2 automatic metrics) and human evaluation (preference study and MQM) demonstrate the effectiveness of MAPS. Further analysis shows that by mimicking the human translation process, MAPS reduces various translation errors such as hallucination, ambiguity, mistranslation, awkward style, untranslated text, and omission. Source code is available at https://github.com/zwhe99/MAPS-mt.

研究动机与目标

推动在大语言模型中探索类人翻译策略。
提出 MAPS 框架，在翻译前从源文本提取与翻译相关的知识。
展示知识挖掘、整合和基于质量的选择如何提升翻译质量。
在多语种方向上使用自动指标和人工判断对 MAPS 进行评估。

提出的方法

介绍 MAPS：三步法的多方面提示与选择：知识挖掘、知识整合和知识选择。
知识挖掘促使 LLM 产出源句的关键词、主题和相关示例。
知识整合利用提取的知识生成多个翻译候选。
知识选择使用无参考质量估计（QE）对候选进行筛选和选取最佳；训练有素的 QE 模型和基于 LLM 的 QE 都有效。
在 11 个翻译方向和 3 个 LLM 上，使用 COMET 和 BLEURT 作为自动指标，并结合人工 MQM 与偏好研究来评估 MAPS。

实验结果

研究问题

RQ1LLMs 是否能够通过从源文本中提取关键词、主题和示例来模仿人类翻译的准备步骤？
RQ2将这三种知识类型纳入是否能在基线和重新排序方法之上提高翻译质量？
RQ3基于质量估计的选择如何影响最终翻译质量及错误类型（幻觉、歧义、误译等）？
RQ4不同知识选择方法（LLM-SCQ、Comet-QE、Comet）对 MAPS 性能的影响？
RQ5三合一提示（整合所有知识类型）是否在各语言对之间带来收益？

主要发现

根据自动指标，MAPS 在 11 个语言方向和 3 个 LLM 上持续优于 Baseline 和 Rerank。
MAPS 搭配 Comet-QE 常达到或超过最佳 WMT22 提交，表明 LLMs 可模仿人类准备策略以提升翻译质量。
同时使用三种知识类型（关键词、主题、示范）可获得最佳结果，消融实验也表明每种类型都具有实质性贡献。
MQM 与人工偏好研究表明 MAPS 的翻译通常更受青睐，且减少了误译、尴尬表达、未翻译文本和省略错误。
与基线和重新排序方法相比，MAPS 降低了逐字的幻觉，并有助于歧义解析任务。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。