Skip to main content
QUICK REVIEW

[论文解读] TI-CNN: Convolutional Neural Networks for Fake News Detection

Yang Yang, Lei Zheng|arXiv (Cornell University)|Jun 3, 2018
Misinformation and Its Impacts参考文献 28被引用 234
一句话总结

TI-CNN 通过双分支 CNN 将显式与潜在文本和图像特征结合,用于检测假新闻,在真实世界数据集上超越基线方法。

ABSTRACT

With the development of social networks, fake news for various commercial and political purposes has been appearing in large numbers and gotten widespread in the online world. With deceptive words, people can get infected by the fake news very easily and will share them without any fact-checking. For instance, during the 2016 US president election, various kinds of fake news about the candidates widely spread through both official news media and the online social networks. These fake news is usually released to either smear the opponents or support the candidate on their side. The erroneous information in the fake news is usually written to motivate the voters' irrational emotion and enthusiasm. Such kinds of fake news sometimes can bring about devastating effects, and an important goal in improving the credibility of online social networks is to identify the fake news timely. In this paper, we propose to study the fake news detection problem. Automatic fake news identification is extremely hard, since pure model based fact-checking for news is still an open problem, and few existing models can be applied to solve the problem. With a thorough investigation of a fake news data, lots of useful explicit features are identified from both the text words and images used in the fake news. Besides the explicit features, there also exist some hidden patterns in the words and images used in fake news, which can be captured with a set of latent features extracted via the multiple convolutional layers in our model. A model named as TI-CNN (Text and Image information based Convolutinal Neural Network) is proposed in this paper. By projecting the explicit and latent features into a unified feature space, TI-CNN is trained with both the text and image information simultaneously. Extensive experiments carried on the real-world fake news datasets have demonstrate the effectiveness of TI-CNN.

研究动机与目标

  • 激励在社交网络中进行假新闻检测的研究及其对信息可信度的影响。
  • 调查文本和图像信息是否共同提升假新闻检测效果。
  • 开发一个统一模型(TI-CNN),将显式与潜在的文本和图像特征融合。
  • 在真实世界的假新闻与真新闻数据集上评估 TI-CNN 相对于基线方法的表现。

提出的方法

  • 提出 TI-CNN,具有文本和图像信息的两个并行分支。
  • 从文本中提取显式特征(如词数统计、标点、大小写等)和图像(分辨率、面部内容)的特征。
  • 使用卷积神经网络在文本(基于词向量的 CNN)和图像(基于图像块的 CNN)上学习潜在特征。
  • 将显式和潜在特征投影到统一空间,融合文本和图像表示用于最终预测。
  • 使用负对数似然、RMSprop 和标准正则化(dropout、L2、早停)端到端训练。

实验结果

研究问题

  • RQ1显式的文本/图像特征是否能够与通过 CNN 获取的潜在特征有效结合来进行假新闻检测?
  • RQ2将文本和图像信息耦合是否优于仅使用其中任一模态的检测性能?
  • RQ3CNN 学习的潜在特征相对于手工设计的显式特征的贡献是什么?
  • RQ4相对于传统的文本模型和仅图像的模型,在真实世界数据上 TI-CNN 的表现如何?

主要发现

  • 数据集包含 20,015 篇文章:11,941 篇假新闻,8,074 篇为真新闻。
  • TI-CNN 在同时使用文本和图像信息时的表现优于基线方法。
  • 仅靠图像信息不足以可靠地检测假新闻。
  • 基于文本的方法(如逻辑回归)在该数据上较弱,而深度文本模型(GRU/LSTM)在长序列上存在局限。
  • 该模型将来自两种模态的显式与潜在特征融合成统一表示,在基线方法上取得更优的性能。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。