QUICK REVIEW

[论文解读] Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Zhenhailong Wang, Shaoguang Mao|arXiv (Cornell University)|Jul 11, 2023

Persona Design and Applications被引用 11

一句话总结

本文介绍了 Solo Performance Prompting (SPP)，这是一种零-shot 方法，使单个大语言模型能够动态识别并与多种角色协作，以解决知识密集和推理密集型任务，主要在 GPT-4 中显示出涌现的认知协同效应。

ABSTRACT

Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

研究动机与目标

动机：通过动态、细粒度的人格模拟来实现认知协同，以降低事实性幻觉并提升 LLM 的推理能力。
目标：使单个 LLM 能识别、模拟并与多个人格协作以完成一般任务，无需外部工具或微调。
在跨越知识密集和推理密集领域的任务上评估 SPP，以研究其有效性与涌现特性。

提出的方法

SPP 让单个 LLM 构建多个人格（包括一个领导者 AI 助手）以完成任务。
参与者从各自角度进行头脑风暴，然后 AI 助手提出初步解决方案并在迭代自我协作中征求反馈。
动态的零-shot 人格识别替代了固定或手动定义的人格。
在多项任务上进行对比分析，比较 Standard Prompting、Chain-of-Thought 与 Self-Refine 的表现。
对 SPP 工作流和中间代际（z_p、z_b、z_s、z_f）进行形式化描述，以建模多轮协作。
评估包括在多项任务上对 GPT-4 的测试，以及消融研究（SPP-Fixed-Persona、SPP-Profile），以分析动态人设的必要性。

实验结果

研究问题

RQ1单个 LLM 是否能够通过动态多角色自我协作来利用认知协同，在不进行微调或使用外部工具的情况下提升知识与推理任务的表现？
RQ2涌现的认知协同效应是否只出现在最强模型（如 GPT-4），而在较小模型（如 GPT-3.5-turbo、Llama2-13b）中不存在？
RQ3动态、细粒度的人格是否必要，还是固定/通用的人格就足以引出领域知识？
RQ4演示设计和人格数量对 SPP 有效性有何影响？
RQ5与传统提示相比，SPP 如何影响知识密集任务中的事实性幻觉？

主要发现

Method	Trivia Creative Writing (N=5) Score	Trivia Creative Writing (N=5) Δ	Trivia Creative Writing (N=10) Score	Trivia Creative Writing (N=10) Δ	Codenames Collaborative Score	Codenames Collaborative Δ	Logic Grid Puzzle Score	Logic Grid Puzzle Δ
Standard prompting	74.6	0.0%	77.0	0.0%	75.4	0.0%	57.7	0.0%
CoT	67.1	↓ 10.0%	68.5	↓ 11.1%	72.7	↓ 3.6%	65.8	↑ 14.1%
Self-Refine [iter=0]	73.8		76.3		75.2		58.8
Self-Refine [iter=1]	73.9	↓ 1.0%	76.9	↓ 0.1%	64.6	↓ 14.6%	60.0	↑ 4.0%
SPP (ours)	79.9	↑ 7.1%	84.7	↑ 10.0%	79.0	↑ 4.8%	68.3	↑ 18.5%

SPP 在 Trivia Creative Writing、Codenames Collaborative 和 Logic Grid Puzzle 等任务上显著优于 Standard Prompting、Chain-of-Thought 与 Self-Refine。
认知协同只在 GPT-4 及以上水平模型中出现，在 GPT-3.5-turbo 或 Llama-13b-chat 中未观察到。
动态、细粒度、自动识别的人格优于固定人格变体（SPP-Fixed-Persona）。
SPP 在多项任务上减少事实性幻觉，同时保持或提升推理性能。
在 Trivia Creative Writing 上，当问题数量增加（N=10 vs N=5）时，SPP 获得的改进更大。
SPP-Profile（人物设定档）并不优于普通的 SPP，表明仅由人名即可引出领域知识。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。