QUICK REVIEW

[論文レビュー] Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models

Boyu Zhang, Hongyang Yang|arXiv (Cornell University)|Oct 6, 2023

Stock Market Forecasting Methods被引用数 8

ひとこと要約

この論文は、外部知識検索を活用して精度とF1スコアを改善する、取得拡張型かつ指示調整済みのLLMフレームワークを金融感情分析に適用し、ベースラインおよび汎用LLMを上回る。

ABSTRACT

Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training datasets, which hampers their generalization capabilities and effectiveness in this field. Recently, Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks due to their commendable zero-shot abilities. Yet, directly applying LLMs to financial sentiment analysis presents challenges: The discrepancy between the pre-training objective of LLMs and predicting the sentiment label can compromise their predictive performance. Furthermore, the succinct nature of financial news, often devoid of sufficient context, can significantly diminish the reliability of LLMs' sentiment analysis. To address these challenges, we introduce a retrieval-augmented LLMs framework for financial sentiment analysis. This framework includes an instruction-tuned LLMs module, which ensures LLMs behave as predictors of sentiment labels, and a retrieval-augmentation module which retrieves additional context from reliable external sources. Benchmarked against traditional models and LLMs like ChatGPT and LLaMA, our approach achieves 15\% to 48\% performance gain in accuracy and F1 score.

研究の動機と目的

限られた文脈と訓練目的の不整合により、従来のNLPおよび汎用LLMが金融感情分析で抱える課題を解決する。
指示調整と外部知識検索を組み合わせた取得拡張型LLMフレームワークを提案する。
確立された金融感情ベンチマークでの性能向上を実証する。
ニュースやツイートのように要約的な金融テキストに対してRAGが予測を改善することを示す。

提案手法

既存データセットを複数の人間作成指示で形式化し、金融感情分析の指示追従データセットを構築する。
因果言語モデリング目的を用いてオープンソースLLM（例：Llama-7B）をファインチューニングし、感情ラベルを予測する。
生成出力を事前定義された感情クラス（ネガティブ/ニュートラル/ポジティブ）にマッピングする。
Bloomberg、Reuters、Goldman Sachs、Seeking Alpha、Twitter、Redditなどの外部ソースから文脈を取得するマルチソース照会と類似性フィルタリングを用いたRetrieved-Augmented Generationモジュールを実装する。
2段階の取得を使用する：1) マルチソース知識クエリ、2) overlap係数(Szymkiewicz-Simpson)による類似性ベースの取得で閾値>0.8を用いて関連文脈を選択する。
FPB、Twitter Val、および追加データセットで精度とF1スコアを評価し、FinBERT、BloombergGPT、Llama-7B、ChatGLM2-6B、ChatGPT-4と比較する。

実験結果

リサーチクエスチョン

RQ1指示調整は、標準的な事前学習目的より金融感情ラベルの予測行動をより効果的に揃えることができるのか。
RQ2取得拡張型生成は、ニュース見出しやツイートのような要約的入力に対して外部金融文脈を提供することで有意な向上をもたらすのか。
RQ3提案フレームワークの、最先端の金融感情モデルおよび汎用LLMと比較した性能はどうか。
RQ4RAGの追加は、ベンチマークデータセット（FPB、Twitter Val）およびケーススタディで感情予測にどのような影響を与えるのか。

主な発見

Model	FPB Acc	FPB F1	Twitter Val Acc	Twitter Val F1
FinBERT	-	-	0.725	0.668
BloombergGPT	-	-	0.510	-
ChatGLM2-6B	0.474	0.402	0.482	0.381
Llama-7B	0.601	0.397	0.544	0.363
ChatGPT 4.0	0.643	0.511	0.788	0.652
Ours	0.758	0.739	0.863	0.811

指示調整済みのLlama-7Bが高い性能を発揮し、FPBおよびTwitter Valでベースラインを上回る。
RAGによりモデルの精度とF1がさらに向上し、いくつかの設定でChatGPT-4を上回る。
FPBおよびTwitter Valでは、提案手法はRAGなしで0.758 Acc / 0.739 F1、RAGありで0.863 Acc / 0.811 F1に到達。
ChatGPT-4.0（RAGなし）はTwitter Valで0.788 Acc / 0.652 F1、FPBで0.643/0.511（表I）を達成；RAGありのChatGPT-4.0はTwitter Valで0.813 Acc / 0.708 F1に達する（表II）。
RAGを用いた ours は Twitter Valで0.881 Acc / 0.842 F1を示す（表II）。
ケーススタディでは、RAGが取得した文脈を提供することで曖昧な表現をより正確な肯定的感情へと変換できることを示す（表III）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。