QUICK REVIEW

[논문 리뷰] BloombergGPT: A Large Language Model for Finance

Shijie Wu, Ozan İrsoy|arXiv (Cornell University)|2023. 03. 30.

Topic Modeling인용 수 299

한 줄 요약

BloombergGPT는 금융 및 공공 데이터의 크고 큼직한 혼합( FinPile + 공개 말뭉치)으로 학습된 50B 매개변수 디코더 전용 LLM로, 일반 NLP 벤치마크에서도 경쟁력을 유지하면서 금융 태스크 성능을 크게 향상시킵니다.

ABSTRACT

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

연구 동기 및 목표

금융 도메인에 특화된 대형 언어 모델을 개발한다.
혼합 도메인 학습을 가능하게 하는 금융 데이터(FinPile)와 공용 데이터를 보완해 대규모 큐레이션 FinPile 데이터셋을 구성한다.
BloombergGPT를 표준 및 내부 금융 벤치마크는 물론 일반 LLM 벤치마크에서 평가한다.
재현을 돕기 위해 데이터 수집, 토크나이저 설계, 모델 아키텍처, 학습 프로토콜 및 평가 방법을 설명한다.
향후 도메인 특화 LLM 이니셔티브를 위한 학습 인사이트와 과제를 공유한다.

제안 방법

BLOOM 아키텍처를 차용한 50B 매개변수 디코더 전용 모델을 구축한다.
혼합 학습 말뭉크: FinPile(금융)에서 363B 토큰과 공용 데이터에서 345B 토큰으로 총 700B 토큰 이상을 사용한다.
131,072 어휘 토큰의 대형 Unigram 토크나이저와 ALiBi 위치 인코딩을 사용한다.
2,048-token 시퀀스와 64x8 A100 GPU, ZeRO stage 3 모델 병렬화를 활용한 좌측-우측 인과 목표로 학습한다.
적합한 경우 BF16과 FP32의 혼합 정밀도 학습, 활성 체크포인팅 및 효율성을 위한 융합 커널을 적용한다.
도메인 특화 및 일반 능력을 평가하기 위해 공개 금융 NLP 벤치마크, 내부 Bloomberg 태스크 및 일반 NLP 벤치마크에서 평가한다.

실험 결과

연구 질문

RQ1BloombergGPT가 금융 NLP 벤치마크에서 일반-purpose LLM과 비교해 어떤 성능을 보이는가?
RQ2혼합 도메인 학습(금융 데이터 + 공개 데이터)이 금융 태스크 성능을 향상시키고 일반 NLP 능력을 저하시키지 않는가?
RQ3데이터셋 구성(FinPile) 및 토크나이저 선택이 모델 성능과 효율성에 어떤 영향을 주는가?
RQ450B 매개변수의 금융 중심 LLM의 학습 구성을 안정화하고 확장하기 위한 신경망 구성 및 최적화 전략은 무엇인가?
RQ5Bloomberg-특정 벤치마크(내부 태스크)가 공개 벤치마크 대비 실제 사용을 어떻게 반영하는가?

주요 결과

BloombergGPT는 도메인 내 금융 태스크에서 기존 모델보다 현저히 우수하게 성능이 좋다.
모델은 금융 중심에도 불구하고 일반 NLP 벤치마크에서 경쟁력 또는 우수한 성능을 유지한다.
학습은 약 569B 토큰에서 훈련된 70개의 계층과 40개의 어텐션 헤드를 가진 50B 매개변수 디코더를 사용한다.
토크나이저는 131,072 토큰의 대형 Unigram 어휘로, 밀집 정보 인코딩을 가능하게 한다.
ALiBi 위치 인코딩과 BLOOM 스타일 디코더 아키텍처는 긴 시퀀스 추론을 효율적으로 보조한다.
평가에는 외부 금융 태스크, 내부 감정 분석 및 NER 프로브, BIG-bench Hard 평가가 포함된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.