QUICK REVIEW

[论文解读] Predicting Financial Markets: Comparing Survey, News, Twitter and Search Engine Data

Huina Mao, Scott Counts|arXiv (Cornell University)|Dec 5, 2011

FinTech, Crowdfunding, Digital Finance被引用 119

一句话总结

本研究比较了来自调查、新闻、Twitter 和 Google 搜索数据的情绪指标，以预测金融市场走势。研究发现，Google 搜索量和 Twitter 情绪（尤其是金融术语的 1–2 天滞后推文量）是日度市场回报率和波动率的强预测指标，而传统投资者调查则在控制其他情绪指标后不再具有统计显著性。

ABSTRACT

Financial market prediction on the basis of online sentiment tracking has drawn a lot of attention recently. However, most results in this emerging domain rely on a unique, particular combination of data sets and sentiment tracking tools. This makes it difficult to disambiguate measurement and instrument effects from factors that are actually involved in the apparent relation between online sentiment and market values. In this paper, we survey a range of online data sets (Twitter feeds, news headlines, and volumes of Google search queries) and sentiment tracking methods (Twitter Investor Sentiment, Negative News Sentiment and Tweet & Google Search volumes of financial terms), and compare their value for financial prediction of market indices such as the Dow Jones Industrial Average, trading volumes, and market volatility (VIX), as well as gold prices. We also compare the predictive power of traditional investor sentiment survey data, i.e. Investor Intelligence and Daily Sentiment Index, against those of the mentioned set of online sentiment indicators. Our results show that traditional surveys of Investor Intelligence are lagging indicators of the financial markets. However, weekly Google Insight Search volumes on financial search queries do have predictive value. An indicator of Twitter Investor Sentiment and the frequency of occurrence of financial terms on Twitter in the previous 1-2 days are also found to be very statistically significant predictors of daily market log return. Survey sentiment indicators are however found not to be statistically significant predictors of financial market values, once we control for all other mood indicators as well as the VIX.

研究动机与目标

评估并比较多种在线情绪指标（基于调查、新闻、Twitter 和搜索引擎数据）对金融指数的预测能力。
确定在实时在线数据时代，传统投资者情绪调查（如 Investor Intelligence、Daily Sentiment Index）是否仍为有效预测指标。
评估实时社交媒体和网络搜索数据是否相比滞后调查数据具备更优的预测能力。
研究情绪指标的时间动态特征，特别是其与市场走势的领先-滞后关系。
为理解不同情绪数据源和指标在日度与周度时间尺度上的金融预测有效性，提供一个比较框架。

提出的方法

从四个数据源收集情绪指标：投资者情绪调查（Investor Intelligence、Daily Sentiment Index）、新闻标题（负面情绪得分）、Twitter（投资者情绪及金融术语的发布量）以及 Google 搜索量（金融术语）。
构建周度和日度情绪指标，并引入时间滞后特征以评估预测能力（例如，Twitter 数据采用 1–2 天滞后）。
应用线性模型预测道琼斯工业平均指数（DJIA）的日对数收益率、交易量、VIX（波动率）和黄金价格。
使用格兰杰因果检验评估情绪指标与金融变量之间的预测方向性。
在多元模型中控制多个情绪指标和 VIX，以隔离各情绪来源的独特预测贡献。
通过 R 平方提升和统计显著性检验（p 值）评估不同市场周期下的预测准确性，特别是在 2011 年 8 月至 9 月的高波动期。

实验结果

研究问题

RQ1在预测 DJIA、交易量、VIX 和黄金价格的日度与周度走势时，哪种在线情绪指标——调查、新闻、Twitter 或搜索引擎数据——表现最佳？
RQ2在控制其他情绪度量后，传统投资者情绪调查（如 Investor Intelligence、DSI）是否对金融市场指标具有统计显著的预测能力？
RQ3与基于 Twitter 的情绪和发布量指标相比，金融术语的 Google 搜索量在预测市场回报率和波动率方面表现如何？
RQ4基于 Twitter 的情绪和发布量指标（TIS、TV-FST）是否与市场走势存在领先-滞后关系？它们是否优于基于调查的情绪指标？
RQ5在市场波动率较高的时期（如 2011 年 8 月至 9 月），不同情绪指标的预测能力如何变化？

主要发现

金融术语的 Google Insight 搜索量（GIS）周度数据是道琼斯工业平均指数收盘价、交易量和 VIX 的统计显著预测指标，格兰杰因果检验证实了其预测方向性。
Twitter 投资者情绪（TIS）以及金融术语在 Twitter 上的 1–2 天滞后发布量（TV-FST）是日度市场对数收益率的高统计显著预测指标，即使在控制其他情绪指标和 VIX 后依然成立。
当其他情绪指标被纳入模型时，传统投资者情绪调查（Investor Intelligence 和 Daily Sentiment Index）对金融市场价格不再具有统计显著性预测能力。
负面新闻情绪（NNS）是市场回报的统计显著预测指标，但其预测能力不如 TIS 和 TV-FST 稳健。
在 2011 年 7 月底至 8 月道琼斯工业平均指数出现大幅下跌之前，Twitter 上金融术语的发布量比 Google 搜索量提前数周上升，表明 Twitter 在早期市场信号探测方面具有潜力。
在预测模型中引入 GIS 显著提升了预测准确性，尤其是在 2011 年 8 月市场波动率高企、VIX 上行且 DJIA 走势下行的时期。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。