QUICK REVIEW

[論文レビュー] Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Liang Tian, Zhiwei He|arXiv (Cornell University)|May 30, 2023

Topic Modeling被引用数 27

ひとこと要約

本論文は自己反省における Degeneration-of-Thought 問題を定義し、分岐思考を促進しLLMの複雑な推論を向上させるための Multi-Agent Debate フレームワークを提案する。

ABSTRACT

Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of "tit for tat" state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Code is available at https://github.com/Skytliang/Multi-Agents-Debate.

研究の動機と目的

Define the Degeneration-of-Thought (DoT) problem in self-reflection for LLMs.
Propose the Multi-Agent Debate (MAD) framework to promote divergent chain-of-thoughts.
Demonstrate MAD’s effectiveness on two challenging tasks: Common MT and Counter-Intuitive AR.
Analyze how debate dynamics and agent parity affect MAD performance.
Show that identical-backbone LLMs can surpass certain stronger models under MAD.

提案手法

Introduce a three-component MAD framework: meta prompts, debaters, and a judge.
Debaters express arguments in a fixed order using the history H; each round adds new arguments.
The judge has a discriminative mode (stop if a solution is obtained) and an extractive mode (output final solution from debate history).
Use adaptive break strategies to decide when to stop the debate to maximize translation quality.
Compare MAD (primarily with GPT-3.5-Turbo as backbone) against baselines like Self-Reflect, Rerank, MAPS, CoT, and Self-Consistency.
Evaluate on two tasks: Commonsense Machine Translation (Common MT) and Counter-Intuitive Arithmetic Reasoning (Counter-Intuitive AR).

実験結果

リサーチクエスチョン

RQ1Can a multi-agent debate avoid the degeneration of thought that plagues self-reflection in LLMs?
RQ2How does introducing multiple agents and a judge influence divergence of thought and final solution quality?
RQ3What debate dynamics (adaptive stopping, tit-for-tat disagreement level) yield the best results on challenging tasks?
RQ4Are identical-backbone LLMs in agents biased when used as judges, affecting fairness and outcomes?
RQ5How does MAD perform relative to strong baselines on translation and reasoning tasks requiring deep contemplation?

主な発見

MAD substantially improves performance over baselines on Common MT and Counter-Intuitive AR.
GPT-3.5-Turbo with MAD can surpass GPT-4 on the Common MT dataset in automatic and human evaluations.
Adaptive debate break strategies and a modest level of disagreement (“tit for tat”) are important for MAD effectiveness.
MAD outcomes reveal that a judge may be biased when agents use different LLM backbones, questioning fairness of cross-model judging.
MAD enables divergent thinking that helps overcome DoT and yields more accurate translations and reasoning in challenging cases.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。