[论文解读] Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
本文介绍 RaR,一种提示方法,LLMs 先改述人类问题再作答;包括一个两步变体,其中改述由一个 LLM 完成,回答由另一个 LLM 完成。
Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities. Data and codes are available at https://github.com/uclaml/Rephrase-and-Respond.
研究动机与目标
- 突出人类-LLM 框架失配导致向 LLM 提出的问题产生误解。
- 提出 RaR,通过让模型在单步或两步提示中重新表述并回答来提升 LLM 的理解。
- 在多样的推理任务中评估 RaR,并与 Chain-of-Thought (CoT) 进行比较。
- 展示 RaR 与 CoT 兼容并可组合以获得更好的性能。
- 讨论可迁移性及对公平评估 LLM 能力的影响。
提出的方法
- 定义 One-step RaR:在一个提示中提示 LLM 重新表述并回答问题,使用 Rephrase and expand the question,以及 respond 指令。
- 定义 Two-step RaR:先由一个改述 LLM 生成改述后的问题,再用原始问题和改述后问题一起作为提示,促使回答的 LLM 给出答案。
- 在理论和经验上将 RaR 与 Chain-of-Thought (CoT) 进行比较。
- 证明 RaR 是无监督、无需训练,并且可与 CoT 互补。
- 展示改述问题在不同 LLM 之间的可迁移性,以及在使用强模型对较弱模型时可能获得的改进。
实验结果
研究问题
- RQ1让 LLM 重述问题是否能在一系列任务中提升回答的准确性?
- RQ2在性能提升和效率方面,One-step RaR 与 Two-step RaR 的比较如何?
- RQ3RaR 如何与 Chain-of-Thought (CoT) 方法相关联并结合?
- RQ4改述的问题能否在不同 LLM 之间迁移,强模型的改述是否能帮助较弱的模型?
- RQ5多次改述是否会收敛为更清晰的问题表述?
主要发现
- One-step RaR 在通用任务中提供普遍的、可即插即用的改进。
- Two-step RaR 持续提升 GPT-4 在各类任务上的性能,在困难问题上往往带来较大提升。
- 不同的 LLM 受益于 RaR,强模型表现更大提升,较弱模型则受益于强模型提供的高质量改述。
- GPT-4 生成的改述问题能够迁移到像 Vicuna 这样的较弱模型上以提升表现。
- RaR 与 CoT 互为补充,可以组合以获得更佳性能。
- RaR 是无监督、无需训练,适用于评估 LLM 能力并实现更公平的比较。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。