QUICK REVIEW

[논문 리뷰] Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?

Haohan Zhang, Fengrui Hua|arXiv (Cornell University)|2023. 06. 25.

Stock Market Forecasting Methods인용 수 8

한 줄 요약

요약: 논문은 중국 금융 뉴스의 감정 추출을 세 가지 LLM 접근법(ChatGPT 기본, Erlangshen-RoBERTa 중국어 모델, 중국어 FinBERT)을 사용해 벤치마크하고 표준화된 백테스트를 통해 거래 성과를 평가하며, Erlangshen-110M-Sentiment가 가장 효과적임을 발견했다.

ABSTRACT

The rapid advancement of Large Language Models (LLMs) has spurred discussions about their potential to enhance quantitative trading strategies. LLMs excel in analyzing sentiments about listed companies from financial news, providing critical insights for trading decisions. However, the performance of LLMs in this task varies substantially due to their inherent characteristics. This paper introduces a standardized experimental procedure for comprehensive evaluations. We detail the methodology using three distinct LLMs, each embodying a unique approach to performance enhancement, applied specifically to the task of sentiment factor extraction from large volumes of Chinese news summaries. Subsequently, we develop quantitative trading strategies using these sentiment factors and conduct back-tests in realistic scenarios. Our results will offer perspectives about the performances of Large Language Models applied to extracting sentiments from Chinese news texts.

연구 동기 및 목표

중국어 금융 뉴스에서 거래 의사결정을 위한 감정 요인을 추출하는 데 LLM의 효과를 평가한다.
모델 간 객관적인 비교를 위한 표준화된 벤치마크 및 백테스트 절차를 제공한다.
이 작업에서 생성형 LLM, 언어별 사전학습된 LLM, 금융 도메인 미세조정 LLM을 비교한다.

제안 방법

시장 개시 전 394,429건의 중국어 뉴스 요약에서 감정 추출에 세 모델을 적용한다.
ChatGPT의 경우 감정을 Good(1), Not Sure(0), 또는 Bad(-1)로 분류하도록 프롬프트를 사용하고 소스별로 평균을 낸다.
WuDao 중국어 코퍼스에서 사전 학습된 Erlangshen-RoBERTa-110M-Sentiment를 사용하여 감정 확률을 출력한다.
도메인 특화 미세조정 분류기로서 Chinese FinBERT를 수동으로 라벨링된 데이터로 훈련한다.
감정 순위를 바탕으로 거래 포트폴리오를 구성하고 표준화된 거래 매개변수로 백테스트를 실행한다.
일관된 프레임워크 하에서 잉여 수익률, 위험 조정 수익률, 승률로 성과를 평가한다.

Figure 1: Demonstration of Prompts Structured for Sentiment Analysis and the Response by ChatGPT

실험 결과

연구 질문

RQ1생성형, 언어별 사전학습, 도메인 특화 미세조정 등 서로 다른 LLM이 중국 금융 뉴스에서 감정 요인을 효과적으로 추출할 수 있는가?
RQ2감정 기반 요인이 표준화된 백테스트 프레임워크에서 거래 성과로 어떻게 변환되는가?
RQ3중국 금융 도메인에서 감정 추출에 있어 언어 또는 도메인 특화 사전 학습이 더 유익한가?

주요 결과

Factor Name	Annual Excess Return (%)	Annual Net Asset Return (%)	Win Rate(%)	Sharpe Ratio
Chinese-GPT	23.1	11.04	57.49	0.6406
Chinese-FinBERT	19.79	7.73	57.19	0.4797
Erlangshen-110M	24.01	11.95	58.38	0.678

Erlangshen-110M-Sentiment는 연간 초과 수익률, 연간 순자산 수익률, 승률, 샤프 비율에서 다른 요인들을 상회한다.
그룹 분석에서 더 높은 Erlangshen 계수 값은 일관되게 더 높은 초과 수익률과 관련이 있다.
더 작은 Erlangshen 모델이 벤치마크 내에서 더 큰 모델에 비해 우수한 성능을 달성한다.
언어별 사전 학습과 도메인 특화 미세 조정은 매우 큰 모델에 의존하지 않고도 중국 금융에 강한 감정 신호를 생성할 수 있다.

Figure 2: Excess Returns of All Three Sentiment Factors

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.