QUICK REVIEW

[论文解读] Analyzing COVID-19 on Online Social Media: Trends, Sentiments and Emotions

Xiaoya Li, Mingxin Zhou|arXiv (Cornell University)|May 29, 2020

Misinformation and Its Impacts参考文献 58被引用 34

一句话总结

本论文分析 COVID-19 相关帖子在 Twitter 和 Weibo 上从 Jan 20 到 May 11, 2020 的时间段，以映射主题趋势、六种基本情感及情感触发因素，比较美国与中国，使用半监督检索和基于 BERT 的情感标注并结合触发提取。

ABSTRACT

At the time of writing, the ongoing pandemic of coronavirus disease (COVID-19) has caused severe impacts on society, economy and people's daily lives. People constantly express their opinions on various aspects of the pandemic on social media, making user-generated content an important source for understanding public emotions and concerns. In this paper, we perform a comprehensive analysis on the affective trajectories of the American people and the Chinese people based on Twitter and Weibo posts between January 20th, 2020 and May 11th 2020. Specifically, by identifying people's sentiments, emotions (i.e., anger, disgust, fear, happiness, sadness, surprise) and the emotional triggers (e.g., what a user is angry/sad about) we are able to depict the dynamics of public affect in the time of COVID-19. By contrasting two very different countries, China and the Unites States, we reveal sharp differences in people's views on COVID-19 in different cultures. Our study provides a computational approach to unveiling public emotions and concerns on the pandemic in real-time, which would potentially help policy-makers better understand people's need and thus make optimal policy.

研究动机与目标

了解公众对 COVID-19 的情感与关注点如何在 Twitter 和 Weibo 上随时间演变。
识别细粒度情感（愤怒、厌恶、恐惧、快乐、悲伤、惊讶）及其触发因素。
对比美国与中国的公众反应，揭示疫情认知中的文化差异。
建立一个实时计算方法来提取公众情感与关切，以便为政策提供信息。

提出的方法

基于引导关键字的引导式半监督检索，通过种子关键词、迭代再训练和显著性关键词扩展来识别 COVID-19 相关帖子。
在英文推文上使用 BERT 的六向多标签情感分类，将情感描述作为提示（sigmoid 输出）。
在 Weibo 数据上使用描述-BERT 模型进行中文情感分类，采用具有文化相关标签的模型。
情感强度 S(t,y) 作为日均 P(y|x) 的计算结果，对非 COVID 文本赋予零概率。
通过 CRF 标注器结合 BERT-MRC 特征进行情感触发提取，并通过 POS、依存关系和 Twitter 特有特征进行增强。
利用 LDA 对顶层触发提及进行无监督聚类，以发现随时间变化的子类别和主题。

实验结果

研究问题

RQ1COVID-19 在 Twitter 和 Weibo 上的主题流行度和情感状态的时间动态是什么？
RQ2六种基本情感在疫情期间如何波动，其语义触发因素是什么？
RQ3美国与中国在公众情感和触发因素上有哪些差异？
RQ4半监督、实时管道是否能够有效采集并跟踪与 COVID-19 相关的帖子及其情感？
RQ5在愤怒和担忧之下的触发因素子类别中，哪些最能解释公众随时间的关切？

主要发现

在 Weibo 上，COVID 相关发布强度在 1 月底至 2 月和 3 月达到峰值；而 Twitter 的关注度自 3 月起上升，在撰写时尚未出现明显下降。
使用基于 BERT 的多标签分类器进行情感标注时，英文推文的 micro F1 为 75.2，macro F1 为 68.3；BERT-描述的 macro F1 达到 77.0。
Weibo 的担忧随一般文本强度而达到峰值；愤怒在 2 月 8 日左右因李文亮之死而激增；Twitter 的愤怒与担忧随美国疫情暴发及政策事件上升。
愤怒的主要触发因素包括封锁、隔离与公众人物（Trump、Pence），以及与中国相关的议题；担忧的触发因素包括就业、财政、家庭关切以及病毒传播。
触发聚类（LDA）揭示了可解释的主题，如与中国相关的愤怒、封锁和医院治疗；担忧主题包括财政、家庭以及日益增加的死亡/确诊数。
Flowers：引导三轮后，COVID 相关推文分类的 F1 分数分别为 0.74、0.82、0.86（早期轮次）。
总体而言，该研究展示了一个在全球危机中具有实时量化公众情感及其驱动因素的框架。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。