QUICK REVIEW

[论文解读] BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan İrsoy|arXiv (Cornell University)|Mar 30, 2023

Topic Modeling被引用 299

一句话总结

BloombergGPT 是一个 50B-parameter 的解码器式大型语言模型，在大量经过筛选的金融与公开数据混合数据集上训练（FinPile + 公共语料库），显著提高金融任务性能，同时在通用 NLP 基准测试中保持竞争力。

ABSTRACT

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

研究动机与目标

开发一个专门用于金融领域的大型语言模型。
构建一个大型、经过精挑细选的 FinPile 数据集（金融数据），并以公共数据补充，以实现混合领域训练。
在标准和内部金融基准以及通用 LLM 基准上评估 BloombergGPT。
解释数据收集、分词器设计、模型架构、训练协议和评估方法，以帮助复现。
分享训练见解与挑战，为未来领域特定的 LLM 计划提供参考。

提出的方法

构建一个受 BLOOM 架构启发的 50B 参数解码器仅模型。
使用混合训练语料：FinPile（金融）363B 个标记和公共数据集345B 个标记，总计超过 700B 个标记。
采用包含 131,072 个词汇标记的 Unigram 分词器和 ALiBi 位置编码。
使用从左到右的因果目标进行训练，序列长度为 2,048，使用 64x8 A100 GPU，采用 ZeRO stage 3 模型并行。
在适当处应用混合精度训练（BF16 和 FP32），启用激活检查点和融合内核以提高效率。
在公共金融 NLP 基准、内部 Bloomberg 任务以及通用 NLP 基准上进行评估，以评估领域特定和通用能力。

实验结果

研究问题

RQ1相较于通用 LLM，BloombergGPT 在金融 NLP 基准上的表现如何？
RQ2混合域训练（金融数据和公共数据）是否在提高金融任务性能的同时不降低通用 NLP 能力？
RQ3数据集构建（FinPile）和分词器选择对模型性能与效率有何影响？
RQ4哪些训练配置与优化策略能可靠地稳定并扩展用于一个 50B 参数的金融聚焦 LLM 的训练？
RQ5Bloomberg 专用基准（内部任务）如何反映现实世界使用情况与公开基准之间的差异？

主要发现

BloombergGPT 在领域内金融任务上显著优于现有模型。
尽管专注于金融领域，该模型在通用 NLP 基准上保持竞争力甚至优越。
训练使用一个 50B 参数的解码器，70 层，40 个注意力头，训练在 ~569B 标记。
分词器是一个包含 131,072 个标记的大型 Unigram 词汇表，能够进行密集信息编码。
ALiBi 位置编码和 BLOOM 风格解码器架构支持高效的长序列推理。
评估包括外部金融任务、内部情感和命名实体识别探针，以及 BIG-bench Hard 评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。