QUICK REVIEW

[論文レビュー] Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Zhenhailong Wang, Shaoguang Mao|arXiv (Cornell University)|Jul 11, 2023

Persona Design and Applications被引用数 11

ひとこと要約

本論文は Solo Performance Prompting (SPP) を提案する。ゼロショットの手法で、1つの LLM が動的に複数のペルソナを識別・協働して、知識系・推論系のタスクを解決する。主に GPT-4 で出現する認知的シナジーを示す。

ABSTRACT

Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.

研究の動機と目的

動機：動的で細粒度なペルソナを用いて認知的シナジーをシミュレートすることで、LLM の事実的ハルシネーションを低減し、推論能力を向上させる。
目的：外部ツールやファインチューニングなしで、1つの LLM が複数のペルソナを識別・シミュレート・協働して、一般的なタスク解決を行えるようにする。
知識集約型および推論集約型の領域にまたがるタスクで SPP を評価し、その有効性と出現する特性を調査する。

提案手法

SPP は単一の LLM に対して、タスクのために複数のペルソナを識別させる（リーダー AI アシスタントを含む）。
参加者は自分の視点からブレインストーミングを行い、その後 AI アシスタントが初期解を提案し、反復的な自己協働でフィードバックを求める。
動的でゼロショットのペルソナ識別が、固定または手動で定義されたペルソナを置き換える。
標準プロンプティング、連鎖思考（CoT）、Self-Refine と複数タスクで比較分析。
SPP ワークフローと中間世代（z_p, z_b, z_s, z_f）を形式的に記述し、マルチターンの協働をモデル化する。
評価には複数タスクにまたがる GPT-4 を含み、動的ペルソナの必然性を分析するためのアブレーション（SPP-Fixed-Persona、SPP-Profile）を実施する。

実験結果

リサーチクエスチョン

RQ1単一の LLM が動的なマルチペルソナ自己協働を通じて認知的シナジーを活用し、ファインチューニングや外部ツールなしに知識・推論タスクを改善できるか。
RQ2出現する認知的シナジー効果は最も能力の高いモデル（例：GPT-4）のみで現れ、より小さなモデル（例：GPT-3.5-turbo、Llama2-13b）では現れないのか。
RQ3動的で細粒度なペルソナは必須か、それとも固定・一般的なペルソナで領域知識を引き出せるのか。
RQ4デモンストレーション設計とペルソナ数が SPP の効果に与える影響は何か。
RQ5知識集約型タスクにおける事実的ハルシネーションは、従来的な prompting と比べて SPP でどう変化するか。

主な発見

Method	Trivia Creative Writing (N=5) Score	Trivia Creative Writing (N=5) Δ	Trivia Creative Writing (N=10) Score	Trivia Creative Writing (N=10) Δ	Codenames Collaborative Score	Codenames Collaborative Δ	Logic Grid Puzzle Score	Logic Grid Puzzle Δ
標準プロンプティング	74.6	0.0%	77.0	0.0%	75.4	0.0%	57.7	0.0%
CoT	67.1	↓ 10.0%	68.5	↓ 11.1%	72.7	↓ 3.6%	65.8	↑ 14.1%
Self-Refine [iter=0]	73.8		76.3		75.2		58.8
Self-Refine [iter=1]	73.9	↓ 1.0%	76.9	↓ 0.1%	64.6	↓ 14.6%	60.0	↑ 4.0%
SPP (ours)	79.9	↑ 7.1%	84.7	↑ 10.0%	79.0	↑ 4.8%	68.3	↑ 18.5%

SPP は Trivia Creative Writing、Codenames Collaborative、Logic Grid Puzzle で Standard Prompting、CoT、Self-Refine を大幅に上回る。
認知的シナジーは GPT-4 レベルのモデルでのみ出現し、GPT-3.5-turbo や Llama-13b-chat では出現しない。
動的で細粒度、自動識別されたペルソナは、固定ペルソナのバリアント（SPP-Fixed-Persona）を上回る。
SPP は複数のタスクで事実的ハルシネーションを低減し、推論パフォーマンスを維持または向上させる。
Trivia Creative Writing では、トリビア問題数が増えるほど SPP の効果が大きくなる（N=10 対 N=5）。
SPP-Profile（ペルソナ・プロファイル）は、通常の SPP を上回らず、ペルソナ名だけで領域知識を引き出せる可能性を示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。