QUICK REVIEW

[論文レビュー] Some Like it Hoax: Automated Fake News Detection in Social Networks

Eugenio Tacchini, Gabriele Ballarin|arXiv (Cornell University)|Apr 25, 2017

Misinformation and Its Impacts参考文献 14被引用数 118

ひとこと要約

この論文は、いいねしたユーザーの集合から偽情報分類を行えることを示しており、ロジスティック回帰と調和的クラウドソーシングを用い、最小限のラベル付きデータでFacebookデータセット上で99%以上の精度を達成している。

ABSTRACT

In recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked" them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.

研究の動機と目的

ソーシャルネットワーク上で急速に広がる偽情報に対処するため、自動的な偽情報検出を動機づける。
投稿をいいねした観衆が、その偽情報であるかどうかを示すかを調査する。
ユーザーと投稿の相互作用データ上で動作する2つの分類器を開発する。
ページやコミュニティを横断した方法のスケーラビリティと転用性を評価する。」,

提案手法

各投稿を、それをいいねしたユーザーのバイナリベクトルで表現し、ロジスティック回帰を適用して偽情報/非偽情報予測のためのユーザー重みを学習する。
ブールラベルクラウドソーシング（調和アルゴリズム）を、ラベル付きトレーニングセットを持つ設定に適応し、いいねを正の票としてモデル化し、投稿の正当性を推定するためにalpha/betaパラメータを更新する。
投稿とユーザーの双方向グラフを用い、ラベル付き投稿からラベルなし投稿へ情報を伝搬する反復更新を行う。
ロジスティック回帰では、重み w_u は各ユーザーが偽情報でない投稿をいいねする傾向と偽情報をいいねする傾向をエンコードする。
調和BLCでは Known-hoax と known-non-hoax の投稿を初期化し、 alpha/beta カウントを用いてユーザーと投稿の信念を反復的に更新する。

実験結果

リサーチクエスチョン

RQ1偽情報は、それに関与するユーザーの集合（いいねしたユーザー）に基づいて識別できるか？
RQ2手動でラベル付けされたトレーニングセットのサイズが拡大すると、分類精度はどのようにスケールするか？
RQ3異なるFacebookページ（コミュニティ）間で情報伝搬がどの程度うまく機能するか？
RQ4相対的に混在するユーザーコミュニティ（交差データセット）を考慮した場合、手法はどれほど頑健か？

主な発見

Experiment	One-page-out Avg accuracy	One-page-out Stdev	Half-pages-out Avg accuracy	Half-pages-out Stdev
Logistic regression	0.794	0.303	0.716	0.143
Harmonic BLC	0.991	0.023	0.993	0.002

両方の手法は、完全なデータセットで高い精度を達成し、驚くほど小さな学習データで99%以上を超える。
調和BLC法は、他ページから学習する場合、ページ間転送精度をほぼ完璧に近づける（≈99%+）。
交差データセットでは、トレーニングデータが小さい場合、ロジスティック回帰が調和BLCよりも優れており、10%のトレーニングで約90%の精度。
調和BLCは、約0.5%の投稿（≈80件）のラベル付けで投稿を分類し、完全データセットで>99%の精度を達成できる。
これらの手法は、偽情報ページと偽情報でないページを横断する偏向や重なるユーザーコミュニティに対して頑健であることを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。