QUICK REVIEW

[论文解读] AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework

Xiang Li, Zhenyu Li|arXiv (Cornell University)|Mar 19, 2024

Stock Market Forecasting Methods被引用 7

一句话总结

本文介绍了 AlphaFin 数据集和 Stock-Chain 框架，该框架将微调后的 StockGPT 与检索增强生成相结合，以解决股票趋势预测和金融问答问题，在 ARR 和准确性方面超越基线。

ABSTRACT

The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning processes. Also, they can not integrate textual information such as financial news or reports. Meanwhile, large language models (LLMs) have remarkable textual understanding and generation ability. But due to the scarcity of financial training datasets and limited integration with real-time knowledge, LLMs still suffer from hallucinations and are unable to keep up with the latest information. To tackle these challenges, we first release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. It has a positive impact on training LLMs for completing financial analysis. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task, which integrates retrieval-augmented generation (RAG) techniques. Extensive experiments are conducted to demonstrate the effectiveness of our framework on financial analysis.

研究动机与目标

形式化地将金融分析任务定义为两项任务：股票趋势预测和金融问答。
创建将传统数据集、实时数据和 CoT 数据结合在一起的 AlphaFin 数据集，用于训练 FinLLMs。
提出带有 RAG 的 Stock-Chain，以缓解幻觉并在分析中实现实时信息。
通过广泛的实验和消融研究证明 Stock-Chain 的有效性。

提出的方法

在 AlphaFin 数据集上使用 LoRA 对 StockGPT 进行微调，以处理股票趋势预测并提供解释。
阶段1 股票趋势预测：为每家公司检索文档，形成提示，预测上涨/下跌；选择预测上涨的指数，并用市值加权投资组合计算 ARR。
阶段2 金融问答：构建向量数据库，借助粗略摘要和 RefGPT 提取知识，执行基于相似性的检索，并在阶段2数据上对 StockGPT 进行微调；使用带有 RAG 的提示生成回答。
RAG 实现包括使用 BGE 的向量嵌入、余弦相似度检索，以及持续更新知识库。
评估包括阶段1的 ARR、ACC 和风险指标；阶段2 的 ROUGE 以及人类/GPT-4 的判断。

实验结果

研究问题

RQ1当与检索增强生成结合时，AlphaFin 规模的 FinLLMs 能否达到最先进的股票趋势预测？
RQ2通过 RAG 集成实时知识是否能提升金融问答质量并降低幻觉，相较于基线 LLMs？
RQ3AlphaFin 组件（数据集、CoT 数据）对 StockGPT 与 Stock-Chain 的性能贡献是什么？
RQ4在 ARR 和用户感知的有效性方面，Stock-Chain 相对于传统 ML/DL 模型和通用 FinLLMs 的表现如何？

主要发现

模型	ARR ↑	AERR ↑	ANVOL ↓	SR ↑	MD ↓	CR ↑	MDD ↓	ACC ↑
SSE50	-1.0%	-2.7%	19.3%	-0.054	45.9%	-0.023	29	-
CSI 300	1.7%	0	18.2%	0.092	39.5%	0.043	30	-
SCI	3.9%	2.2%	14.8%	0.266	21.5%	0.183	19	-
CNX	7.6%	5.9%	26.5%	0.287	41.3%	0.185	20	-
Randomforest	9.8%	8.1%	19.5%	0.501	16%	0.608	22	55.5%
RNN	8.1%	6.4%	10.9%	0.742	15.7%	0.515	12	54.1%
BERT	10.7%	9.0%	16.1%	0.664	13.5%	0.852	14	51.4%
GRU	11.2%	9.5%	13.7%	0.814	14.6%	0.765	21	54.7%
LSTM	11.8%	10.1%	15.4%	0.767	15.3%	0.768	19	55.2%
Logistic	12.5%	10.8%	27.1%	0.463	32.5%	0.385	18	54.8%
XGBoost	13.1%	11.4%	20.5%	0.633	20.9%	0.619	17	55.9%
Decision Tree	13.4%	11.7%	19.6%	0.683	11.9%	1.126	20	55.1%
ChatGLM2	8.1%	6.4%	24.9%	0.324	62.6%	0.126	26	49.5%
ChatGPT(3.5Turbo)	14.3%	12.6%	27.7%	0.516	53.6%	0.267	23	51.4%
FinMa	15.7%	14.0%	37.1%	0.422	66.3%	0.236	25	49.1%
FinGPT	17.5%	15.8%	28.9%	0.605	55.5%	0.312	24	50.5%
Stock-Chain	30.8%	29.1%	19.6%	1.573	13.3%	2.314	10	55.7%

Stock-Chain 在 AlphaFin-Test Stage-1 的评估模型中获得最高的 ARR（30.8%）和 ACC（55.63%）。
同时使用金融报告和 CoT 数据进行微调可获得最好的股票趋势预测结果，优于仅使用原始数据和单队列微调。
Stock-Chain 与 RAG 在阶段2 提供更优的 ROUGE 分数（例如 ROUGE-1 0.4352，ROUGE-2 0.3056，ROUGE-L 0.4031）以及强烈的人类GPT-4 偏好结果。
Stock-Chain 在金融分析任务中持续优于基线模型，包括 FinGPT 和 FinMA，取得显著的 ARR 增益和有利的偏好评价。
消融研究表明，结合新闻与报告数据可获得 Stage-2 问答的最佳性能（ROUGE 指标与内容质量）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。