QUICK REVIEW

[论文解读] Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models

Boyu Zhang, Hongyang Yang|arXiv (Cornell University)|Jun 22, 2023

Stock Market Forecasting Methods被引用 11

一句话总结

本论文通过对小型金融情感数据集进行指令微调，使通用型大语言模型（LLaMA-7B）在金融情感分析中超过 FinBERT 与 ChatGPT，强调数值敏感性和上下文理解。

ABSTRACT

Sentiment analysis is a vital tool for uncovering insights from financial articles, news, and social media, shaping our understanding of market movements. Despite the impressive capabilities of large language models (LLMs) in financial natural language processing (NLP), they still struggle with accurately interpreting numerical values and grasping financial context, limiting their effectiveness in predicting financial sentiment. In this paper, we introduce a simple yet effective instruction tuning approach to address these issues. By transforming a small portion of supervised financial sentiment analysis data into instruction data and fine-tuning a general-purpose LLM with this method, we achieve remarkable advancements in financial sentiment analysis. In the experiment, our approach outperforms state-of-the-art supervised sentiment analysis models, as well as widely used LLMs like ChatGPT and LLaMAs, particularly in scenarios where numerical understanding and contextual comprehension are vital.

研究动机与目标

证明对通用型 LLM 进行指令微调可以提升金融情感分析的效果。
解决金融文本中的数值敏感性，以便从数字中更好解读情感。
评估由该 LLM 先验知识增强的上下文理解在情感预测中的作用。
将指令微调的 LLaMA-7B 与 FinBERT 及 ChatGPT 在金融情感任务上进行比较。

提出的方法

将情感分类数据集转换为带有 10 条人为撰写的指令的指令微调格式。
在格式化的指令数据上使用监督序列到序列损失对 LLaMA-7B 进行微调。
将自回归输出映射到三个情感标签（积极、消极、中性）。
在 FinBERT 与 LLaMA-7B 上评估模型以评估上下文与数值敏感性。
使用 8 个 A100 GPU、DeepSpeed，在指定超参数下进行 10 个时期的训练。

实验结果

研究问题

RQ1如何利用指令微调的 LLM 提高金融情感分析中的数值敏感性？
RQ2由通用 LLM 知识获得的上下文理解对金融情感预测有何影响？
RQ3在金融情感任务中，指令微调的 FinGPT 与传统 FinBERT 和通用 LLM 有何差异？
RQ4少量的指令数据是否能让通用型 LLM 达到最先进的性能？

主要发现

Name	Size	Metrics	FinBERT	LLaMA-7B	Instruct-FinGPT-7B
Twitter Val	2388	Acc / F1 / Testing Time	0.725 / 0.668 / 18 seconds (1 GPU)	0.54 / 0.36 / 498 seconds (8 GPUs)	0.880 / 0.841 / 498 seconds (8 GPUs)
Numerical	117	Acc / F1	0.633 / 0.630	0.60 / 0.42	0.837 / 0.795
Contextual	20	Acc / F1	0.50 / 0.22	0.55 / 0.34	0.80 / 0.63

Instruct-FinGPT-7B 在所有评估数据集的准确率和 F1 上均优于 FinBERT 和 LLaMA-7B。
该模型表现出强烈的数值敏感性，能够正确解读与金融数字相关的情感。
来自对指令微调的 LLM 的上下文理解在缺乏或含糊的上下文时提升了情感解读。
Zero-shot FPB 评估 favor (偏爱) Instruct-FinGPT-7B 超过 ChatGPT-3.5 与 LLaMA-7B，表明泛化能力良好。
所需训练量适中（≈58 分钟，8 个 A100 GPU），仅需少量指令数据。
该方法在显著低于 BloombergGPT 的训练资源下实现了更高的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。