Skip to main content
QUICK REVIEW

[论文解读] COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations

Karishma Sharma, Seo S|arXiv (Cornell University)|Mar 26, 2020
Misinformation and Its Impacts参考文献 29被引用 138
一句话总结

This study collects Twitter data from March 1 to June 5, 2020 to identify and analyze COVID-19 misinformation using fact-checking sources, examining narratives, engagements, and spread via a public misinformation dashboard.

ABSTRACT

The ongoing Coronavirus (COVID-19) pandemic highlights the inter-connectedness of our present-day globalized world. With social distancing policies in place, virtual communication has become an important source of (mis)information. As increasing number of people rely on social media platforms for news, identifying misinformation and uncovering the nature of online discourse around COVID-19 has emerged as a critical task. To this end, we collected streaming data related to COVID-19 using the Twitter API, starting March 1, 2020. We identified unreliable and misleading contents based on fact-checking sources, and examined the narratives promoted in misinformation tweets, along with the distribution of engagements with these tweets. In addition, we provide examples of the spreading patterns of prominent misinformation tweets. The analysis is presented and updated on a publically accessible dashboard (https://usc-melady.github.io/COVID-19-Tweet-Analysis) to track the nature of online discourse and misinformation about COVID-19 on Twitter from March 1 - June 5, 2020. The dashboard provides a daily list of identified misinformation tweets, along with topics, sentiments, and emerging trends in the COVID-19 Twitter discourse. The dashboard is provided to improve visibility into the nature and quality of information shared online, and provide real-time access to insights and information extracted from the dataset.

研究动机与目标

  • Quantify COVID-19 misinformation on Twitter using fact-checking sources and external links.
  • Characterize the narratives and topics promoted in misinformation tweets.
  • Analyze engagement patterns and propagation cascades of misinformation across geographies.
  • Provide a publicly accessible dashboard for real-time insights into COVID-19 discourse and misinformation.

提出的方法

  • Collect streaming Twitter data related to COVID-19 from March 1 to June 5, 2020 (85.04M tweets; 54.32M English).
  • Label misinformation tweets by linking external content to fact-checking sources (Media Bias/Fact Check, NewsGuard, Zimdars).
  • Build information cascades from the retweet/reply graph and mark source tweets as misinformation if they link to misinformation sources.
  • Perform misinformation analysis including distribution across source types, engagement analysis, and narrative extraction via TF-IDF on hashtags.
  • Conduct sentiment analysis using a lexical method (Hutto and Gilbert 2014) and aggregate country-level sentiments.
  • Apply topic modeling with character embeddings to identify 20 topics within English tweets.

实验结果

研究问题

  • RQ1What narratives and topics characterize COVID-19 misinformation on Twitter?
  • RQ2How do misinformation tweets distribute across sources and engagement patterns (retweets/replies)?
  • RQ3How does misinformation propagate geographically through cascades?
  • RQ4What are the sentiment trends related to COVID-19 interventions and discourse?
  • RQ5Can a public dashboard provide real-time tracking of misinformation, topics, and trends?

主要发现

  • The dataset comprises 85.04 million tweets collected globally, with 63.88% in English and 43.02% containing geolocation data; 10.61 million user accounts are represented, of which 7.51% are verified.
  • 3.29% of source tweets with external links (150.8K) link to misinformation sources identified from fact-checking sites.
  • Misinformation cascades include large spreads, with the largest cascade exceeding 10,000 retweets across multiple countries.
  • Distinctive hashtags by misinformation type were identified via TF-IDF analysis, revealing category-specific narratives (unreliable, conspiracy, clickbait, political/biased).
  • Engagement patterns vary by category, with unreliable and conspiracy sources generally receiving fewer responses relative to their source tweet volume.
  • Sentiment and topic analyses show evolving country-level perceptions and topic clusters over time, tracked via a public dashboard.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。