QUICK REVIEW

[论文解读] Thread of Thought Unraveling Chaotic Contexts

Yucheng Zhou, Xiubo Geng|arXiv (Cornell University)|Nov 15, 2023

Topic Modeling被引用 10

一句话总结

ThoT 提示对混乱语境进行分段分析，支持即插即用的大型语言模型，并在 PopQA、EntityQ 和 MTCR 数据集上实现优于 CoT 和基础提示的推理性能。

ABSTRACT

Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the "Thread of Thought" (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile "plug-and-play" module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques.

研究动机与目标

在检索增强和多轮对话中引入混乱语境的问题动机。
提出 Thread of Thought（ThoT）提示策略，作为一种即插即用的解决方案。
展示ThoT在长尾问答和MTCR任务中相对于CoT与常规提示提升推理性能。
展示提示设计和模型规模如何影响ThoT的有效性。

提出的方法

两步提示以在扩展上下文中模拟人类般的推理。
第一步让模型按可管理的部分逐步梳理上下文，并进行摘要与分析。
第二步从结构化的推理输出中提取最终答案。
基于模板的提示将混乱上下文 X、查询 Q 和触发句整合以启动ThoT推理。
在多种LLM（GPT-3.5-turbo、GPT-4、LLaMA 2 Chat、Vicuna）上比较不同提示策略（Vanilla、 Retrieval、 CoT、 ThoT）的表现。
评估数据集包括 PopQA、EntityQ，以及用于多轮对话的自定义 MTCR 数据集。

实验结果

研究问题

RQ1在混乱上下文条件下，ThoT是否能相较于CoT与普通提示提升推理？
RQ2在检索增强与多轮对话场景中，ThoT相对于现有提示方法的表现如何？
RQ3模型规模是否在不同架构中放大ThoT的收益？
RQ4哪些提示设计能最大化ThoT在不同任务中的有效性和一致性？

主要发现

在评估的模型上，ThoT在PopQA和EntityQ的严格匹配指标上优于 Vanilla、Retrieval 和 CoT。
在 MTCR 中，ThoT在 GPT-4、GPT-3.5-turbo 以及 LLaMA 2 70B 上相比其他提示获得更优表现。
ThoT 的提升与更大模型规模相关，对于检索增强的上下文显示显著提升。
明确指示逐步分析和按段摘要的提示设计可获得更高的 EM 分数；更具指令性的提示能带来更好的性能。
案例研究展示ThoT在跨来源综合信息的能力（例如推断某个乐队演奏 garage punk），这是CoT可能失败的情形。
错误分析指出在隐式关系推理方面的挑战，提示未来改进的方向。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。