QUICK REVIEW

[论文解读] Rumor Detection and Classification for Twitter Data

Sardar Hamidian, Mona Diab|arXiv (Cornell University)|Nov 25, 2019

Misinformation and Its Impacts被引用 65

一句话总结

本文提出一种两步方法用于在 Twitter 数据上检测谣言并对其进行分类，引入新的特征和预处理策略，在感兴趣的度量上达到接近 ROC 的指标（在混合谣言数据集上的 F-measure >0.82，在单一谣言数据集上达到 84% 的准确率）.

ABSTRACT

With the pervasiveness of online media data as a source of information verifying the validity of this information is becoming even more important yet quite challenging. Rumors spread a large quantity of misinformation on microblogs. In this study we address two common issues within the context of microblog social media. First we detect rumors as a type of misinformation propagation and next we go beyond detection to perform the task of rumor classification. WE explore the problem using a standard data set. We devise novel features and study their impact on the task. We experiment with various levels of preprocessing as a precursor of the classification as well as grouping of features. We achieve and f-measure of over 0.82 in RDC task in mixed rumors data set and 84 percent in a single rumor data set using a two-step classification approach.

研究动机与目标

动机：在广泛的在线微博数据中验证信息的必要性并应对错误信息传播。
开发一个两步流程来检测谣言并随后对其进行分类。
探索特征工程和预处理策略，以提升与谣言相关的分类性能。
评估特征分组对分类结果的影响。

提出的方法

为谣言检测及后续分类定义一个两步分类流程。
提出新的特征集并研究它们对性能的影响。
在分类前尝试不同级别的文本预处理。
对特征进行分组，以评估它们对准确性和鲁棒性的综合影响。
使用标准的 Twitter 谣言数据集来评估性能。

实验结果

研究问题

RQ1在 Twitter 数据中，谣言作为信息传播的一种形式能否被有效检测？
RQ2相比单步方法，两步方法是否能改善谣言检测及随后的分类？
RQ3预处理级别和特征分组对分类性能的影响是什么？

主要发现

数据集	任务 / 阶段	指标	数值
Mixed rumors data set	RDC task	F-measure	over 0.82
Single rumor data set	Classification	Accuracy	84 percent

两步方法在混合谣言数据集的 RDC 任务上实现了 F-measure 超过 0.82。
两步方法在单一谣言数据集上实现了 84% 的准确率。
新颖的特征和预处理策略有助于性能提升。
特征分组影响分类模型的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。