QUICK REVIEW

[論文レビュー] Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

Yi Han, Shanika Karunasekera|arXiv (Cornell University)|Jul 7, 2020

Misinformation and Its Impacts参考文献 60被引用数 77

ひとこと要約

この論文は、テキスト内容なしで偽ニュースを検出するための伝播ベースのグラフニューラルネットワークを用い、完全再訓練なしで新規の未見データに対する性能を維持するために継続学習を導入しています。

ABSTRACT

Although significant effort has been applied to fact-checking, the prevalence of fake news over social media, which has profound impact on justice, public trust and our society, remains a serious problem. In this work, we focus on propagation-based fake news detection, as recent studies have demonstrated that fake news and real news spread differently online. Specifically, considering the capability of graph neural networks (GNNs) in dealing with non-Euclidean data, we use GNNs to differentiate between the propagation patterns of fake and real news on social media. In particular, we concentrate on two questions: (1) Without relying on any text information, e.g., tweet content, replies and user descriptions, how accurately can GNNs identify fake news? Machine learning models are known to be vulnerable to adversarial attacks, and avoiding the dependence on text-based features can make the model less susceptible to the manipulation of advanced fake news fabricators. (2) How to deal with new, unseen data? In other words, how does a GNN trained on a given dataset perform on a new and potentially vastly different dataset? If it achieves unsatisfactory performance, how do we solve the problem without re-training the model on the entire data from scratch? We study the above questions on two datasets with thousands of labelled news items, and our results show that: (1) GNNs can achieve comparable or superior performance without any text information to state-of-the-art methods. (2) GNNs trained on a given dataset may perform poorly on new, unseen data, and direct incremental training cannot solve the problem---this issue has not been addressed in the previous work that applies GNNs for fake news detection. In order to solve the problem, we propose a method that achieves balanced performance on both existing and new datasets, by using techniques from continual learning to train GNNs incrementally.

研究の動機と目的

非テキストのソーシャルコンテキスト特徴を用いた伝播ベースの偽ニュース検出の動機付け。
ツイート内容を用いず、伝播パターンから偽ニュースと本物ニュースを区別する GNN の性能を評価する。
未知データセットでのモデル性能と素朴な逐次トレーニングの限界を調査する。
既存データと新規データの両方でバランスの取れた性能を達成するための継続学習手法を提案する。

提案手法

ニュースの伝播パターンを、ノードがツイート/ユーザーで追加ノードがニュース項目となるグラフとしてモデル化する。
DiffPool を用いる。グラフSageに基づくグラフ分類GNNで伝播グラフを分類する。
ユーザープロフィール属性やタイムライン由来指標などの非テキスト特徴から隣接行列と特徴行列を構築する。
異なる特徴セット（ユーザープロフィール、タイムライン特徴、またはその両方）と、フォロワー/フォロー関係の有無で実験する。
PolitiFact と GossipCop のデータセットを、複数のランダム分割で accuracy, precision, recall, F1 を用いて評価する。
複数データセットから学習する際の崩壊的忘却を緩和するために継続学習手法（GEM と EWC）を適用する。

実験結果

リサーチクエスチョン

RQ1ツイート内容なしで、非テキストの伝播パターンだけを用いて偽ニュースを識別できるか。
RQ21つのデータセットで訓練したGNNは別の未知データセットでどう動作するか、継続学習はデータセット間の一般化を改善できるか。

主な発見

非テキスト伝播特徴を用いたGNNは、評価データセットにおいてテキストベースの最先端手法と同等またはそれを上回る性能を達成する。
単一データセットで訓練したモデルは別のデータセットでの性能が低く、素朴な逐次学習はデータセット間の性能バランスを取れない。
継続学習手法（GEM、EWC）を組み込むとデータセット間でよりバランスの取れた性能を得られ、実験では一般にGEMがEWCを上回る。
考慮した設定では、フォロー関係の追加は性能の顕著な改善を示さなかった。
トレーニングは急速に収束し、多くのモデルが比較的少ないエポック数で安定した性能に達する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。