QUICK REVIEW

[논문 리뷰] Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca|arXiv (Cornell University)|2019. 02. 10.

Misinformation and Its Impacts참고 문헌 45인용 수 188

한 줄 요약

본 논문은 지오메트릭 딥 러닝을 이용해 Twitter 확산에서의 전파 기반 그래프 방식의 가짜뉴스 탐지기를 제안하여 높은 ROC AUC와 조기 탐지 성능을 달성한다.

ABSTRACT

Social media are nowadays one of the main news sources for millions of people around the globe due to their low cost, easy access and rapid dissemination. This however comes at the cost of dubious trustworthiness and significant risk of exposure to 'fake news', intentionally written to mislead the readers. Automatically detecting fake news poses challenges that defy existing content-based analysis approaches. One of the main reasons is that often the interpretation of the news requires the knowledge of political or social context or 'common sense', which current NLP algorithms are still missing. Recent studies have shown that fake and real news spread differently on social media, forming propagation patterns that could be harnessed for the automatic fake news detection. Propagation-based approaches have multiple advantages compared to their content-based counterparts, among which is language independence and better resilience to adversarial attacks. In this paper we show a novel automatic fake news detection model based on geometric deep learning. The underlying core algorithms are a generalization of classical CNNs to graphs, allowing the fusion of heterogeneous data such as content, user profile and activity, social graph, and news propagation. Our model was trained and tested on news stories, verified by professional fact-checking organizations, that were spread on Twitter. Our experiments indicate that social network structure and propagation are important features allowing highly accurate (92.7% ROC AUC) fake news detection. Second, we observe that fake news can be reliably detected at an early stage, after just a few hours of propagation. Third, we test the aging of our model on training and testing data separated in time. Our results point to the promise of propagation-based approaches for fake news detection as an alternative or complementary strategy to content-based approaches.

연구 동기 및 목표

맥락과 상식의 필요성으로 인해 콘텐츠 기반 방법을 넘어 가짜 뉴스 탐지의 도전을 고무한다.
콘텐츠, 사용자, 네트워크 특징을 융합하는 그래프 기반의 전파 인식 모델을 제안한다.
대규모 Twitter 데이터셋에서 전파 패턴이 가짜 뉴스 탐지에 강력한 신호를 제공한다는 것을 입증한다.

제안 방법

각 컨볼루션 계층에서 그래프 어텐션을 사용하는 두 개의 그래프 컨볼루션과 두 개의 완전 연결 계층으로 구성된 네 층(Graph CNN)을 제안한다.
사용자 프로필, 사용자 활동, 소셜 네트워크 구조, 뉴스 확산을 각각의 URL/캐스케이드에 대한 단일 그래프 입력 Gu로 통합한다.
같은 URL을 공유하는 트윗을 팔로우 관계와 확산 경로를 통해 연결하여 입력 그래프를 계산한다; 간선은 컨볼루션 계층의 주의 메커니즘을 통해 업데이트되는 다중 관계 특성을 가진다.
힌지 손실(정규화 없음)과 SELU 활성화를 사용하고, 학습률 5e-4의 AMSGrad로 학습한다.
2013–2018년 Twitter 데이터를 대상으로 URL-별 및 캐스케이드-별 설정으로 평가하며, 캐스케이드는 6개 트윗 이상이고 24시간 확산 창을 가진 경우를 사용한다.

실험 결과

연구 질문

RQ1전파 및 소셜 네트워크 구조 특징만으로(콘텐츠 없이) 트위터에서 가짜 뉴스를 신뢰성 있게 탐지할 수 있는가?
RQ2URL 기준 분류와 캐스케이드 기준 분류 간 성능 차이는 무엇이며 조기에 탐지가 얼마나 효과적으로 가능한가?
RQ3학습 데이터와 테스트 데이터 간의 시간 경과에 대해 모델의 내구성은 얼마나 되는가?
RQ4최소 캐스케이드 크기가 탐지 성능에 미치는 영향은 무엇인가?
RQ5어떤 특징 그룹(사용자 프로필, 활동, 네트워크/전파, 콘텐츠)이 예측에 가장 큰 기여를 하는가?

주요 결과

URL-별 ROC AUC 92.70% (±1.80)로 다섯 폴드에서.
캐스케이드-별 ROC AUC 88.30% (±2.74)로 다섯 폴드에서.
확산 시간이 증가함에 따라 정확도가 향상되며, URL-별 설정에서 약 15시간, 캐스케이드-별 설정에서 약 7시간에서 포화된다.
특성 제거(ablation) 분석에서 사용자 프로필 및 네트워크/전파 특징이 두 설정 모두에서 가장 중요한 것으로 나타났다.
모델 노화: URL-별 성능은 약 180일 이후 감소하고, 캐스케이드-별은 더 느리게 감소하며(260일 후 ≤4%).
t-SNE 시각화는 모델이 학습한 신뢰 가능한 사용자와 신뢰 불가 사용자 간의 뚜렷한 군집화를 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.