QUICK REVIEW

[論文レビュー] Sentiment Analysis through LLM Negotiations

Xiaofei Sun, Xiaoya Li|arXiv (Cornell University)|Nov 3, 2023

Sentiment Analysis and Opinion Mining被引用数 12

ひとこと要約

本論文は、推論を組み込んだ生成器と説明を導く判別器が反復的に感情決定を交渉するマルチ-LLM交渉フレームワークを提案し、複数のベンチマークで単一-LLM ICLベースラインより高い精度を達成する。

ABSTRACT

A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round under the framework of in-context learning. This framework suffers the key disadvantage that the single-turn output generated by a single LLM might not deliver the perfect decision, just as humans sometimes need multiple attempts to get things right. This is especially true for the task of sentiment analysis where deep reasoning is required to address the complex linguistic phenomenon (e.g., clause composition, irony, etc) in the input. To address this issue, this paper introduces a multi-LLM negotiation framework for sentiment analysis. The framework consists of a reasoning-infused generator to provide decision along with rationale, a explanation-deriving discriminator to evaluate the credibility of the generator. The generator and the discriminator iterate until a consensus is reached. The proposed framework naturally addressed the aforementioned challenge, as we are able to take the complementary abilities of two LLMs, have them use rationale to persuade each other for correction. Experiments on a wide range of sentiment analysis benchmarks (SST-2, Movie Review, Twitter, yelp, amazon, IMDB) demonstrate the effectiveness of proposed approach: it consistently yields better performances than the ICL baseline across all benchmarks, and even superior performances to supervised baselines on the Twitter and movie review datasets.

研究の動機と目的

単一のLLMによる感情分析を、皮肉や否定などの複雑な言語現象を含む点で限界づけて動機づける。
精度と頑健性を向上させる生成器-判別器のマルチ-LLM交渉フレームワークを提案する。
2つのLLM、オプションで3つ目のLLMの協調が、さまざまなデータセットで教師ありベースラインより良い結果を生み出すことを示す。
役割反転交渉と推論ベースの説明が意思決定品質と解釈性を向上させる。

提案手法

推論を組み込んだ生成器（G）と説明を導く判別器（D）による2-LLM設定を導入する。
Gは retrieved demonstrations に条件付けられた構造化推論チェーンとともに感情決定を生成する。
DはGの出力を評価し、正当化を提供し、合意に至るまで反復交渉を誘導する場合がある。
役割を入れ替えた交渉は、決定をさらに洗練させるためにGとDの役割を入れ替える。
意見が対立する場合、6つの交渉結果を統合・投票する第三のLLMを起動する。
実験はバックボーンとしてGPT-3.5、GPT-4、およびInstructGPT-3.5を使用する；KNNベースのデモンストレーション取得にはRoBERTa-Largeを使用する。

Figure 1: An illustration of a generator (G) and a discriminator (D) achieving consensus via a negotiation. Each round consists of a user prompt and a response from either G or D . Specifically, a user prompt includes four elements: a task description, few-shot demonstrations (abbreviate it for shor

実験結果

リサーチクエスチョン

RQ1二つのLLM間の生成器-判別器交渉は、感情分析における単一LLMのインコンテキスト学習を上回るか。
RQ2役割反転（および任意の第三LLM投票）は、多様な感情ベンチマークでさらに精度と頑健性を改善するか。
RQ3交渉プロセスに explícit な推論チェーンを含めることは性能にどのような影響を与えるか。
RQ4SST-2、MR、Twitter、Yelp、Amazon、IMDBの標準データセットと比較して、本手法は superviseds baselines とどの程度比較できるか。

主な発見

2-LLMの交渉フレームワークは、複数のデータセットでバニラのICLベースラインより一貫して精度を向上させる。
2つの異なるLLMを用いた交渉は、MR、Twitter、IMDBで自己交渉（1LLM）設定を著しいマージンで上回る。
役割反転交渉と3つ目のLLMの追加はさらに性能を向上させ、合意主導の意思決定を可能にする。
この手法は、TwitterとMovie Reviewデータセットでいくつかの教師ありベースラインを上回り、複数のベンチマークでRoBERTa-Largeとの差を縮小する。
推論を含むプロンプトは重要であり、推論ステップを削除すると、交渉設定での性能低下が単一LLMベースラインより大きくなる。

Figure 2: Illustration of the negotiation procedure. The left demonstration shows a case where an agreement on the positive sentiment is reached after turns turns, while the right demonstration shows a case where two LLMs fail to reach an agreement in three turns. Specifically, a user prompt include

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。