QUICK REVIEW

[논문 리뷰] Learning Hierarchical Discourse-level Structure for Fake News Detection

Hamid Reza Karimi, Jiliang Tang|arXiv (Cornell University)|2019. 02. 27.

Misinformation and Its Impacts참고 문헌 38인용 수 35

한 줄 요약

HDSF는 문서의 계층적 담화 수준 의존성 구조를 자동으로 학습하고 이를 활용하여 위조 뉴스 분류를 위한 구조적으로 풍부한 표현을 생성하며 기준값보다 뛰어난 성능을 보입니다.

ABSTRACT

On the one hand, nowadays, fake news articles are easily propagated through various online media platforms and have become a grand threat to the trustworthiness of information. On the other hand, our understanding of the language of fake news is still minimal. Incorporating hierarchical discourse-level structure of fake and real news articles is one crucial step toward a better understanding of how these articles are structured. Nevertheless, this has rarely been investigated in the fake news detection domain and faces tremendous challenges. First, existing methods for capturing discourse-level structure rely on annotated corpora which are not available for fake news datasets. Second, how to extract out useful information from such discovered structures is another challenge. To address these challenges, we propose Hierarchical Discourse-level Structure for Fake news detection. HDSF learns and constructs a discourse-level structure for fake/real news articles in an automated and data-driven manner. Moreover, we identify insightful structure-related properties, which can explain the discovered structures and boost our understating of fake news. Conducted experiments show the effectiveness of the proposed approach. Further structural analysis suggests that real and fake news present substantial differences in the hierarchical discourse-level structures.

연구 동기 및 목표

계층적 담화 수준 구조가 위조 뉴스와 진짜 뉴스를 구분할 수 있는지 조사한다.
주석 데이터 없이 담화 종속성을 학습하는 엔드-투-엔드 프레임워크를 개발한다.
구조적으로 정보를 담은 문서 표현을 만들어 위조 뉴스 분류를 효과적으로 수행한다.
구조와 관련된 속성들이 위조 뉴스와 진짜 뉴스를 구분하고 일관성(coherence)과 어떤 관계가 있는지 확인한다.

제안 방법

각 문장을 단어 임베딩에서 파생된 BLSTM 기반 임베딩으로 표현한다.
주 애 러 간 의존 확률을 학습하여 주의(attention) 기반 행렬 A와 루트 확률 r을 통해 담화 의존성 트리를 형성한다.
A와 r을 이용해 각 문서에 대해 트리 구조를 얻기 위해 그리디하게 담화 트리를 구성한다.
잠재 부모와 자식으로부터 구조적으로 의식적인 문장 표현 p_j와 c_j를 계산한 후 g_j를 도출한다.
g_j를 모아 구조적으로 풍부한 문서 표현 x를 형성하고 교차 엔트로피 손실로 이진 위조/진짜 분류를 수행한다.
역전파를 사용하여 전체 프레임워크를 엔드-투-엔드로 학습한다; 트리 구성은 사후(post hoc)이며 구분되지 않는다는 점에 유의한다.

실험 결과

연구 질문

RQ1제안된 HDSF 프레임워크가 기준값에 비해 위조 뉴스 탐지 성능을 향상시키는가?
RQ2담화 트리의 구조 관련 속성들이 위조 뉴스와 진짜 뉴스를 어떻게 구분하고 일관성과 어떤 관계가 있는가?

주요 결과

방법	정확도 (%)
N-gram	72.37
LIWC	70.26
RST	67.68
BiGRNN-CNN	77.06
LSTM[w+s]	80.54
LSTM[s]	73.63
HDSF	82.19

HDSF는 결합 데이터셋에서 기준값보다 유의하게 우수한 성능을 보인다(82.19% 정확도 vs. LSTM[w+s]의 80.54%).
문서의 구조를 고려한 표현은 n-그램이나 LIWC와 같은 콘텐츠 전용 특성보다 더 강한 분리력을 제공한다.
담화 의존성 트리는 모든 제안된 속성에서 위조 뉴스와 진짜 뉴스 간에 상당한 차이를 보이며 진짜 뉴스가 더 높은 일관성을 보인다.
사후(post hoc) 그리드 트리 구성 방법은 문장 간 확률을 사용하여 루트와 부모-자식 관계를 구성한다.
훈련 및 개발 그래프는 최적화 과정에서 학습 오차가 감소하고 정확도가 증가하는 경향을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.