QUICK REVIEW

[论文解读] Prompting Large Language Model for Machine Translation: A Case Study

Biao Zhang, Barry Haddow|arXiv (Cornell University)|Jan 17, 2023

Natural Language Processing Techniques被引用 68

一句话总结

对使用 GLM-130B 的机器翻译提示策略进行系统性研究，考察提示模板、示例演示、单语数据使用以及跨场景的迁移学习。

ABSTRACT

Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.

研究动机与目标

评估提示模板如何影响跨语言对的机器翻译质量。
研究演示示例如何影响提示性能，以及如何有效选择它们。
探索在提示中使用单语数据以及伪平行提示示例的潜力。
考察迁移学习：跨语言、跨领域以及句子到文档的提示效应。

提出的方法

以 GLM-130B（INT4 量化）作为固定的用于翻译提示的语言模型。
在多个模板和演示策略下评估零-shot 与少量-shot 提示。
构建并分析消融集以研究演示特征及其与提示性能之间的相关性。
尝试单语数据、回译/前向翻译以创建伪平行提示示例。
研究跨场景转移：跨语言、跨领域及文档级提示的转移。
分析输出中常见的提示相关问题及潜在的缓解措施。

实验结果

研究问题

RQ1哪些 MT 提示模板能带来最佳性能，模板语言如何影响结果？
RQ2演示示例如何影响提示性能，演示的哪些特征与更好的 MT 提示相关？
RQ3单语数据能否在提示中有效使用，伪平行提示示例的比较结果如何？
RQ4提示演示在跨语言、跨领域和输出粒度（句子与文档）上有多大程度的转移？
RQ5在 MT 提示中会出现哪些实际问题（如拷贝、幻觉、提示陷阱等），以及如何缓解？

主要发现

模板选择对零-shot MT 的影响显著；对于使用 GLM-130B 将文本翻译成 En/De/Zh，英文简单模板通常表现最好。
若干演示特征（长度、LM 分数、语义相似度）与提示性能相关，但相关性较弱，且并非始终具有预测性。
提示中的单语数据通常会降低 MT 性能；通过回译/前向翻译获得的伪平行提示能提升提示效果，其中回译更稳健。
提示显示出一定的迁移能力，但跨场景的增益有限，且来自一个场景的演示并不能在另一个场景可靠地优于零-shot。
提示仍然存在拷贝、实体错译、幻觉和提示陷阱等问题，通过英语枢转在非英语方向可带来一些好处。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。