QUICK REVIEW

[논문 리뷰] Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking

Rutvik Vijjali, Prathyush Potluri|arXiv (Cornell University)|2020. 11. 26.

Misinformation and Its Impacts참고 문헌 24인용 수 47

한 줄 요약

두 단계 파이프라인은 트랜스포머 모델을 사용하여 관련 COVID-19 팩트 설명을 가져오고 텍스트 함의로 주장을 검증하여 뛰어난 정확도와 거의 실시간 성능을 달성합니다.

ABSTRACT

The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims. The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.

연구 동기 및 목표

팬데믹 기간 동안 잘못된 정보 확산을 억제하기 위한 강력한 COVID-19 가짜 뉴스 탐지의 동기를 부여합니다.
거짓 주장과 연결된 검증된 설명의 동적 지식 베이스를 개발합니다.
Model A가 관련 설명을 가져오고 Model B가 함의를 통해 진위를 확인하는 두 단계 파이프라인을 설계합니다.

제안 방법

모델 A: 주장-설명 이진 함의에 대해 트랜스포머를 학습시켜 후보 설명을 탐색하고; 주장/설명의 임베딩을 캐시한 뒤 코사인 유사도로 비교하여 상위 후보를 선택합니다.
모델 B: 검증을 텍스트 함의 문제로 간주합니다; 모델 A 매개변수로 초기화하고 교차 검증 데이터에서 미세조정하며 진실 확률을 출력합니다.
기준 비교에는 TF, TF-IDF, GloVe 피처를 간단한 분류기와 함께 포함합니다.
평가된 트랜스포머: MobileBERT, BERT, ALBERT 및 조합(BERT+ALBERT 등)을 실시간에 근접한 계산 고려와 함께.

실험 결과

연구 질문

RQ1두 단계 트랜스포머 파이프라인이 관련 COVID-19 설명을 효과적으로 검색하고 함의를 통해 주장 진위를 검증할 수 있습니까?
RQ2검색(Model A)과 검증(Model B)에 대해 어떤 사전 학습 모델(BERT, ALBERT, MobileBERT)이 최상의 성능을 보합니까?
RQ3두 단계 접근법이 실시간 배치를 위한 정확도와 대기 시간 측면에서 고전 NLP 기준치와 어떻게 비교됩니까?
RQ4COVID-19 도메인에서 주장-설명 쌍의 학습 및 평가를 지원하는 데이터셋은 무엇입니까?
RQ5검색 품질과 검증 정확도를 가장 잘 반영하는 임계값과 평가 지표는 무엇입니까?

주요 결과

트랜스포머 기반 모델이 검색과 검증 모두에서 고전 NLP 기준치를 능가합니다.
테스트 세트에서의 전체 최고 성능은 BERT(Model A)과 ALBERT(Model B)의 조합으로 달성됩니다.
모델 A는 높은 MRR 및 Recall@10으로 관련 설명을 검색합니다; 모델 B는 코사인 유사도 임계값 이상인 상위 설명을 사용할 때 높은 정확도를 달성합니다.
Table 3에서 BERT+ALBERT는 최고 테스트 세트 정확도 0.855를 기록하고, MRR 0.632, Recall@10 0.795를 달성합니다.
거의 실시간 성능이 가능하며, MobileBERT가 가장 낮은 지연을 보이고 ALBERT가 가장 낮은 메모리 사용량을 보입니다; BERT+ALBERT 조합은 주장당 1398 MB 및 약 2.471초를 사용합니다.
데이터셋은 COVID-19 팩트 체크에서 추출된 학습용 5500개의 거짓 주장–설명 쌍과 테스트용 교차 검증 쌍 200개로 구성됩니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.