Skip to main content
QUICK REVIEW

[论文解读] HiSA-SMFM: Historical and Sentiment Analysis based Stock Market Forecasting Model

Ishu Gupta, Tarun Kumar Madan|arXiv (Cornell University)|Mar 10, 2022
Stock Market Forecasting Methods被引用 27
一句话总结

HiSA-SMFM 将历史股票数据与 Twitter 情感结合,使用 TextBlob 和 Tweepy 进行情感分析,采用 LSTM 训练,提升 Tata Motors (NSE) 的股票价格预测。

ABSTRACT

One of the pillars to build a country's economy is the stock market. Over the years, people are investing in stock markets to earn as much profit as possible from the amount of money that they possess. Hence, it is vital to have a prediction model which can accurately predict future stock prices. With the help of machine learning, it is not an impossible task as the various machine learning techniques if modeled properly may be able to provide the best prediction values. This would enable the investors to decide whether to buy, sell or hold the share. The aim of this paper is to predict the future of the financial stocks of a company with improved accuracy. In this paper, we have proposed the use of historical as well as sentiment data to efficiently predict stock prices by applying LSTM. It has been found by analyzing the existing research in the area of sentiment analysis that there is a strong correlation between the movement of stock prices and the publication of news articles. Therefore, in this paper, we have integrated these factors to predict the stock prices more accurately.

研究动机与目标

  • 通过结合情感数据与历史价格来推动改进的股票价格预测。
  • 开发一个能够动态将多特征情感信号与历史数据集成的模型。
  • 利用 LSTM 从合并的历史与情感特征中学习,以进行未来价格预测。
  • 在印度真实股票(Tata Motors)上验证模型并与最先进基线进行比较。

提出的方法

  • 使用 Tweepy 收集与股票相关的 Twitter 情感信号数据。
  • 应用 TextBlob 将情感分类为积极、消极和中性,并转换为百分比特征。
  • 从 NSE India 提取 Tata Motors 的历史股票特征(开盘价等)。
  • 将情感百分比与历史特征合并,形成多特征训练集。
  • 在整合数据集上训练 LSTM 网络,以预测未来的股票价格。

实验结果

研究问题

  • RQ1将历史价格数据与 Twitter 派生情感结合,是否比仅使用历史数据的股票价格预测更好?
  • RQ2在预测 Tata Motors 股票价格时,HiSA-SMFM 模型在不同训练 epoch 下的表现如何?
  • RQ3在使用多特征输入时,HiSA-SMFM 相对于 DLPM 基线的相对性能提升是多少?
  • RQ4是否可以将多特征情感表示(积极/消极百分比)有效地整合到 LSTM 预测中?

主要发现

Epoch 大小模型准确率
5DLPM [1]91.59%
5HiSA-SMFM95.41%
10DLPM [1]94.56%
10HiSA-SMFM97.18%
15DLPM [1]83.46%
15HiSA-SMFM92.38%
  • HiSA-SMFM 在 epoch 尺度为 5、10、15 的情况下均实现比 DLPM 基线更高的准确率。
  • 在 epoch 5 时,DLPM:91.59% vs HiSA-SMFM:95.41%。
  • 在 epoch 10 时,DLPM:94.56% vs HiSA-SMFM:97.18%。
  • 在 epoch 15 时,DLPM:83.46% vs HiSA-SMFM:92.38%。
  • 平均准确率:DLPM 89.87% vs HiSA-SMFM 94.99%,显示 HiSA-SMFM 提升超过 5%。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。