QUICK REVIEW

[论文解读] Modelling spatiotemporal variation of positive and negative sentiment on Twitter to improve the identification of localised deviations.

Zubair Shah, Paige Martin|arXiv (Cornell University)|Feb 22, 2018

Misinformation and Its Impacts参考文献 50被引用 1

一句话总结

本研究基于1654万条来自100座城市的英文推文（2017年7月至11月），采用基于词典的情感分析方法，对Twitter上的时空情感变化进行建模。结果表明，城市和一天中的时间段是解释积极情感（R = 0.236）和消极情感（R = 0.306）方差的最主要因素，且在模型中考虑基线情感后，能更有效地检测与新闻事件相关的局部情感异常。

ABSTRACT

Studies examining how sentiment on social media varies over time and space appear to produce inconsistent results. Analysing 16.54 million English-language tweets from 100 cities posted between 13 July and 30 November 2017, our aim was to clarify how spatiotemporal and social factors contributed to variation in sentiment on Twitter. We estimated positive and negative sentiment for each of the cities using dictionary-based sentiment analysis and constructed models to explain differences in sentiment using time of day, day of week, weather, interaction type (social or non-social), and city as factors. Tests in a distinct but contiguous period of time showed that all factors were independently associated with sentiment. In the full multivariable model of positive (Pearson's R in test data 0.236; 95% CI 0.231-0.241), and negative (Pearson's R in test data 0.306 95% CI 0.301-0.310) sentiment, city and time of day explained more of the variance than other factors. Extreme differences between observed and expected sentiment using the full model appeared to be better aligned with international news events than degenerate models. In applications that aim to detect localised events using the sentiment of Twitter populations, it is useful to account for baseline differences before looking for unexpected changes.

研究动机与目标

通过分析影响Twitter上情感的时空与社交因素，澄清社交媒体情感研究中不一致的发现。
识别在大规模Twitter数据集中，哪些因素——一天中的时间、星期几、天气、互动类型或城市——对情感变化的解释力最强。
构建一个多变量模型，以捕捉基线情感模式，从而提升对与预期情感偏离的局部异常的检测能力。
通过独立测试期验证模型的预测能力，并评估其与国际新闻事件的一致性。

提出的方法

对数据集中每条推文应用基于词典的情感分析方法，估算其正面与负面情感得分。
以一天中的时间、星期几、天气状况、互动类型（社交 vs. 非社交）及城市作为预测变量，构建多变量回归模型。
在训练期（2017年7月13日至11月30日）训练完整模型，并在连续的保留测试期进行评估，以检验预测性能。
计算观测值与预测值之间的情感相关系数（Pearson相关系数R），以评估测试数据中模型的拟合程度。
将模型性能与退化模型（degenerate models）进行比较，评估在考虑基线变化后，极端情感异常是否与新闻事件更一致。
鉴于其对解释方差的显著贡献，将城市层面和时间因素作为关键解释变量。

实验结果

研究问题

RQ1哪些时空与社交因素对Twitter上正面与负面情感变化的解释力最强？
RQ2多变量模型在利用一天中的时间、星期几、天气与互动类型等变量时，预测跨城市与时间段的情感变化能力如何？
RQ3在考虑基线情感后，对与现实世界事件相关的局部情感异常检测能力提升程度如何？
RQ4与简单基线模型相比，使用完整模型时，极端情感异常是否能更准确地与国际新闻事件对齐？

主要发现

在测试数据中，城市和一天中的时间是解释正面情感（R = 0.236；95% CI 0.231–0.241）与负面情感（R = 0.306；95% CI 0.301–0.310）方差的最重要因素。
所有因素——一天中的时间、星期几、天气、互动类型与城市——在多变量模型中均与情感独立相关。
完整模型预测出的极端情感异常，与国际新闻事件的对齐程度优于退化模型。
完整模型捕捉基线情感模式的能力，显著提升了对与预期情感偏离的局部异常的检测能力。
模型的预测性能稳健，观测值与预测值在正面与负面情感维度上均表现出中等到较强的皮尔逊相关性。
天气与互动类型对情感的影响较小但具有统计显著性，表明其存在超越时间与地理因素的情境性影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。