QUICK REVIEW

[논문 리뷰] Online Fake Review Detection Using Supervised Machine Learning And BERT Model

Abrar Qadir Mir, Furqan Yaqub Khan|arXiv (Cornell University)|2023. 01. 09.

Spam and Phishing Detection인용 수 11

한 줄 요약

본 논문은 BERT-파생 단어 임베딩을 전통적인 분류기와 결합하여 SVM이 최상의 정확도(87.81%)를 달성하고 이전 연구보다 7.6% 우수하다는 것을 발견했다.

ABSTRACT

Online shopping stores have grown steadily over the past few years. Due to the massive growth of these businesses, the detection of fake reviews has attracted attention. Fake reviews are seriously trying to mislead customers and thereby undermine the honesty and authenticity of online shopping environments. So far, various fake review classifiers have been proposed that take into account the actual content of the review. To improve the accuracies of existing fake review classification or detection approaches, we propose to use BERT (Bidirectional Encoder Representation from Transformers) model to extract word embeddings from texts (i.e. reviews). Word embeddings are obtained in various basic methods such as SVM (Support vector machine), Random Forests, Naive Bayes, and others. The confusion matrix method was also taken into account to evaluate and graphically represent the results. The results indicate that the SVM classifiers outperform the others in terms of accuracy and f1-score with an accuracy of 87.81%, which is 7.6% higher than the classifier used in the previous study [5].

연구 동기 및 목표

확대되는 온라인 쇼핑 환경에서 신뢰할 수 있는 가짜 리뷰 탐지 필요성에 대해 동기 부여합니다.
가짜 리뷰 분류를 위한 BERT-파생 단어 임베딩의 효과를 조사합니다.
BERT 특성을 사용한 감독 기계 학습 분류기(SVM, Random Forest, Naive Bayes 등)의 비교를 수행합니다.
정확도와 F1-score와 같은 혼동 행렬에서 파생된 지표를 사용하여 모델 성능을 평가합니다.

제안 방법

리뷰에서 BERT 모델을 사용해 단어 임베딩을 추출합니다.
BERT 특징에 대해 SVM, Random Forest, Naive Bayes 등 감독 분류기를 학습합니다.
정확도와 F1-score로 분류기를 평가하고 결과를 혼동 행렬로 요약합니다.
현재 결과를 이전 연구와 비교해 개선 정도를 평가합니다.
SVM이 다른 분류기보다 높은 87.81%의 정확도로 우수하다고 보고합니다.

실험 결과

연구 질문

RQ1BERT 기반 단어 임베딩이 표준 감독 분류기에서 가짜 리뷰 탐지 성능을 향상시킬 수 있는가?
RQ2어떤 분류기(SVM, Random Forest, Naive Bayes 등)가 BERT 특징으로 가짜 리뷰를 가장 잘 탐지하는가?
RQ3제안한 접근법이 정확도와 F1-score 측면에서 이전 가짜 리뷰 탐지 방법과 비교하여 어떤 차이가 있는가?

주요 결과

BERT 임베딩을 이용한 SVM이 가장 높은 정확도 87.81%를 달성합니다.
제안된 접근법이 이전 연구에서 사용된 분류기보다 7.6% 우수합니다.
BERT에서 얻은 단어 임베딩은 감독 학습자와 결합될 때 가짜 리뷰 분류에 효과적인 특징입니다.
혼동 행렬 기반 평가가 보고된 정확도와 F1-score 향상을 뒷받침합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.