QUICK REVIEW

[论文解读] FinGPT: Open-Source Financial Large Language Models

Hongyang Yang, Xiaoyang Liu|arXiv (Cornell University)|Jun 9, 2023

FinTech, Crowdfunding, Digital Finance被引用 23

一句话总结

FinGPT 提供一个开源、以数据为中心的金融LLM框架，能够在实时数据上进行轻量级微调（LoRA/RLSP），以实现对 FinLLMs 的民主化并推动金融领域的应用。

ABSTRACT

Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality financial data is the first challenge for financial LLMs (FinLLMs). While proprietary models like BloombergGPT have taken advantage of their unique data accumulation, such privileged access calls for an open-source alternative to democratize Internet-scale financial data. In this paper, we present an open-source large language model, FinGPT, for the finance sector. Unlike proprietary models, FinGPT takes a data-centric approach, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. We highlight the importance of an automatic data curation pipeline and the lightweight low-rank adaptation technique in building FinGPT. Furthermore, we showcase several potential applications as stepping stones for users, such as robo-advising, algorithmic trading, and low-code development. Through collaborative efforts within the open-source AI4Finance community, FinGPT aims to stimulate innovation, democratize FinLLMs, and unlock new opportunities in open finance. Two associated code repos are https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP

研究动机与目标

通过一个开源框架实现金融数据和 FinLLMs 的民主化。
强调数据获取、清洗和预处理对 FinLLMs 的重要性。
提出一个面向金融的端到端四层架构（数据源、数据工程、LLMs、应用）。
展示实际应用和演示，说明 FinGPT 在金融领域的潜力。

提出的方法

采用以数据为中心的方法，优先进行高质量的数据整理和预处理。
构建一个端到端的四层框架：Data Source, Data Engineering, LLMs, Applications。
通过 LoRA 对开源 LLMs 进行轻量级微调，以将可训练参数从 6.17B 降至 3.67M。
提供基于强化学习的替代微调方法：以 Reinforcement Learning on Stock Prices (RLSP) 为例，使用股票价格变动作为反馈。
利用类似 RLHF 的方法和提示工程，使模型输出与金融任务保持一致。

实验结果

研究问题

RQ1开源、数据为中心的管线如何支持在金融领域实现高效的 FinLLMs？
RQ2在对金融 LLMs 进行微调时，使用 LoRA 与 RLSP 的权衡是什么？
RQ3实时数据工程是否能够克服金融数据的高时序敏感性和低信噪比？
RQ4FinGPT 能实现哪些实际的金融应用并展示开源 FinLLMs 的价值？

主要发现

FinGPT 展示了具成本效益的微调方法：在每次训练成本不到 300 美元的情况下适配开源 LLMs，相较于 BloombergGPT 的更高成本。
该框架强调实时数据摄取、清理和 NLP 处理，以应对金融数据的时序敏感性和噪声。
基于 LoRA 的微调将可训练参数从 6.17B 降至 3.67M，同时利用领域相关的金融信号。
RLSP 通过使用股票价格变动作为模型与市场响应对齐的客观信号，提供市场驱动的反馈回路。
开源生态系统和教程使应用成为可能，如机器人投顾、量化交易、风险管理和金融教育。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。