QUICK REVIEW

[論文レビュー] Stance Detection on Social Media with Fine-Tuned Large Language Models

İlker Gül, Rémi Lebret|arXiv (Cornell University)|Apr 18, 2024

Sentiment Analysis and Opinion Mining被引用数 11

ひとこと要約

本論文は ChatGPT、LLaMa-2、Mistral-7B を複数の公開データセットでファインチューニングし、ゼロショット、少数ショット、ファインチューニング設定で stance 検出を評価する。ChatGPT-ft や LLaMa-2/Mistral などのファインチューニング済みLLM に特に高い性能を示す。

ABSTRACT

Stance detection, a key task in natural language processing, determines an author's viewpoint based on textual analysis. This study evaluates the evolution of stance detection methods, transitioning from early machine learning approaches to the groundbreaking BERT model, and eventually to modern Large Language Models (LLMs) such as ChatGPT, LLaMa-2, and Mistral-7B. While ChatGPT's closed-source nature and associated costs present challenges, the open-source models like LLaMa-2 and Mistral-7B offers an encouraging alternative. Initially, our research focused on fine-tuning ChatGPT, LLaMa-2, and Mistral-7B using several publicly available datasets. Subsequently, to provide a comprehensive comparison, we assess the performance of these models in zero-shot and few-shot learning scenarios. The results underscore the exceptional ability of LLMs in accurately detecting stance, with all tested models surpassing existing benchmarks. Notably, LLaMa-2 and Mistral-7B demonstrate remarkable efficiency and potential for stance detection, despite their smaller sizes compared to ChatGPT. This study emphasizes the potential of LLMs in stance detection and calls for more extensive research in this field.

研究の動機と目的

社会的メディア文脈における伝統的 ML から BERT および LLM への stance 検出手法の進展を評価する。
ファインチューニング済みLLM（ChatGPT、LLaMa-2、Mistral-7B）を stance 検出データセットで評価する。
ゼロショット、少数ショット、完全にファインチューニングされた性能を、さまざまなターゲットとトピックで比較する。

提案手法

SemEval-2016、P-Stance、Twitter Stance 2020 データセットを用いた LoRA による ChatGPT、LLaMa-2 (7B/13B)、および Mistral-7B のファインチューニング。
A100 GPU 上で BF16、データの 10% でウォームアップ、3 エポック、LR=3e-4、バッチサイズ 128。
比較と対照のための指示チューニング済みバリアントを用いたゼロショットおよび少数ショットプロンプトを評価。
デ prompting 戦略のためにデータセット固有のテンプレートと付録でプロンプトを裏付ける。
F_avg と F1-macro を主要指標として対象ごとに報告。

実験結果

リサーチクエスチョン

RQ1ファインチューニング済み LLM は社会メディアデータセットの stance 検出においてどうか？従来のベースラインと比較して。
RQ2トレーニングサイズとプロンプティング戦略（ゼロショット、少数ショット、ファインチューニング）が stance 検出性能に与える影響は？
RQ3SemEval-2016、P-Stance、Twitter Stance 2020 のターゲット（政治家とトピック）は、ファインチューニングによってどれが最も改善されるか？

主な発見

モデル	FM	HC	LA	A	CC	DT
BiCond	61.4	59.8	54.5	-	-	59.0
MemNet	57.8	60.3	61.0	-	-	-
AoA	60.0	58.2	62.4	-	-	-
TAN	55.8	65.4	63.7	59.3	53.5	-
ASGCN	58.7	64.3	62.9	-	-	58.7
AT-JSS-Lex	61.5	68.3	68.4	69.2	59.2	-
TPDG	67.3	73.4	74.7	-	-	63.0
TR-Tweet+COT	70.6	78.7	63.8	72.9	54.1	-
COLA	69.1	75.9	71.0	62.3	64.0	71.2
ChatGPT-ft	79.7	83.4	72.6	81.3	86.2	70.4
LLaMa-2-7b-ft	73.3	84.2	71.2	78.9	69.8	72.0
LLaMa-2-13b-ft	76.0	86.5	72.5	76.9	80.4	70.9
Mistral-7b-ft	78.7	85.0	76.0	74.7	71.8	68.6

ファインチューニング済み LLM は SemEval-2016 でベースラインを大幅に上回り、ChatGPT-ft は FM で最大 79.7 を達成、LLaMa-2-13b-ft は HC で 86.5 に達した。
P-Stance では、ChatGPT-ft が最高の F_avg （Bernie 81.8、Biden 89.7、Trump 91.9）をもたらした。
Twitter Stance 2020 では、ChatGPT-ft が F1-macro 85.1（Biden）、85.6（Trump）を達成。
ゼロショットおよび少数ショットのプロンプトは、ファインチューニングされたモデルに移行する際に顕著な利点を示す。例えば、LLaMa-2-7b-ft は FM で 51.6（ゼロショット）から 73.3（ファインチューニング）へ改善。
トレーニングサイズの実験では、70% のデータで一部ターゲット（例：HC の LLaMa-2-7b で）完全訓練とほぼ同等の結果を得られることがわかった。
オープンソースの LoRA チューニング済みLLM は、ベースラインと同等かそれを上回る効率性で競争力のある stance 検出を提供し、コスト効果が高く正確な分析を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。