Skip to main content
QUICK REVIEW

[论文解读] Twitter discussions and emotions about COVID-19 pandemic: a machine learning approach

Jia Xue, Junxiang Chen|arXiv (Cornell University)|May 26, 2020
Misinformation and Its Impacts参考文献 27被引用 29
一句话总结

本研究分析了2020年3月1日至4月21日期间收集的400万条Twitter消息,采用潜在狄利克雷分布(LDA)和情感分析方法,识别出公众在讨论COVID-19时的主导话题、主题和情绪。研究发现,当讨论新增病例和死亡人数时,恐惧是主导情绪;而在更广泛的疫情讨论中,则体现出期待、信任、愤怒和恐惧等情绪,凸显了Twitter在实时公共卫生监测中的应用潜力。

ABSTRACT

The objective of the study is to examine coronavirus disease (COVID-19) related discussions, concerns, and sentiments that emerged from tweets posted by Twitter users. We analyze 4 million Twitter messages related to the COVID-19 pandemic using a list of 25 hashtags such as "coronavirus," "COVID-19," "quarantine" from March 1 to April 21 in 2020. We use a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigram, bigrams, salient topics and themes, and sentiments in the collected Tweets. Popular unigrams include "virus," "lockdown," and "quarantine." Popular bigrams include "COVID-19," "stay home," "corona virus," "social distancing," and "new cases." We identify 13 discussion topics and categorize them into five different themes, such as "public health measures to slow the spread of COVID-19," "social stigma associated with COVID-19," "coronavirus news cases and deaths," "COVID-19 in the United States," and "coronavirus cases in the rest of the world". Across all identified topics, the dominant sentiments for the spread of coronavirus are anticipation that measures that can be taken, followed by a mixed feeling of trust, anger, and fear for different topics. The public reveals a significant feeling of fear when they discuss the coronavirus new cases and deaths than other topics. The study shows that Twitter data and machine learning approaches can be leveraged for infodemiology study by studying the evolving public discussions and sentiments during the COVID-19. Real-time monitoring and assessment of the Twitter discussion and concerns can be promising for public health emergency responses and planning. Already emerged pandemic fear, stigma, and mental health concerns may continue to influence public trust when there occurs a second wave of COVID-19 or a new surge of the imminent pandemic.

研究动机与目标

  • 通过社交媒体研究公众对COVID-19大流行的讨论、关切与情绪。
  • 识别早期大流行阶段Twitter话语中的显著话题与主题。
  • 评估在不同讨论主题中,恐惧、愤怒、期待和信任等主导情绪的表现。
  • 评估Twitter数据与机器学习在信息流行病学和公共卫生应急响应中的潜力。

提出的方法

  • 在2020年3月1日至4月21日期间,使用25个与COVID-19相关的标签收集了400万条推文。
  • 应用潜在狄利克雷分布(LDA)以识别13个独立的讨论话题,并将其归类为五个总体主题。
  • 从推文语料库中识别出流行的一元词(如“virus”、“lockdown”)和二元短语(如“stay home”、“social distancing”)。
  • 开展情感分析,以分类不同话题中的主导情绪,重点关注期待、恐惧、愤怒和信任。
  • 将主题划分为公共卫生措施、社会污名、病例/死亡报告、美国特定讨论以及全球病例等类别。
  • 使用机器学习技术实时映射公众情绪与关注点的演变,以支持公共卫生应用。

实验结果

研究问题

  • RQ1在大流行早期,公众在Twitter上讨论COVID-19时,最常涉及的话题与主题是什么?
  • RQ2恐惧、愤怒、期待和信任等情绪在不同讨论话题中如何变化?
  • RQ3当用户讨论新增病例和死亡人数时,哪种情绪占主导地位,相较于其他与疫情相关的话题有何差异?
  • RQ4应用于Twitter数据的机器学习模型在多大程度上能够支持实时公共卫生监测与应急规划?

主要发现

  • 最常见的一元词为“virus”、“lockdown”和“quarantine”,而主要的二元短语包括“COVID-19”、“stay home”和“social distancing”。
  • 识别出13个独立的讨论话题,并归类为五个主题,包括公共卫生措施和社会污名。
  • 在讨论新增病例和死亡人数时,恐惧成为主导情绪,其强度超过其他情绪。
  • 对防护措施的期待是所有话题中最为普遍的情绪,其次是信任、愤怒和恐惧的混合情绪。
  • 本研究证明,Twitter数据与机器学习可有效追踪公共卫生紧急情况下的公众情绪与关注点演变。
  • 持续存在的与疫情相关的恐惧、污名化及心理健康担忧,可能在未来疫情高峰或第二波来袭时削弱公众信任。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。