QUICK REVIEW

[논문 리뷰] Google Scholar is manipulatable

Hazem Ibrahim, Fengyuan Liu|arXiv (Cornell University)|2024. 02. 07.

Artificial Intelligence in Healthcare and Education인용 수 11

한 줄 요약

연구는 Google Scholar가 인용 구매와 가짜 프로필을 통해 조작될 수 있음을 보여주고, 인용 수가 평가 맥락에서 구입 가능하고 오도될 수 있음을 입증한다.

ABSTRACT

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.

연구 동기 및 목표

연구자들이 채용 및 승진 결정에서 인용 지표에 얼마나 널리 의존하는지 평가한다.
상위 대학의 학자들 사이에서 Google Scholar를 인용 데이터 출처로써의 중요성을 정량화한다.
의심스러운 Google Scholar 프로필의 패턴과 잠재적 조작 기법을 식별한다.
인용 구매의 실현 가능성과 측정 지표에 미치는 영향을 입증한다.
잠재적으로 의심스러운 인용 활동을 경고할 지표(c2-index)를 제안한다.

제안 방법

상위 10개 대학의 교원을 대상으로 인용 데이터의 출처를 확인하기 위한 설문조사를 실시한다.
비정상적인 인용 패턴을 분석하기 위해 160만 개가 넘는 Google Scholar 프로필 데이터를 선별한다.
실재 가능성을 입증하기 위해 가상의 저자를 대상으로 50개의 인용을 구매하는 은밀 실험을 수행한다.
가상의 Google Scholar 프로필을 만들고 AI 생성 기사를 업로드하여 moderation과 indexing을 테스트한다.
제3자 서비스를 통해 인용을 구매하고 다수의 인용을 통한 조작의 증거를 인용 논문에서 분석한다.
고도로 집중된 대량 인용을 경고하기 위해 c2-index(인용 집중 지수)를 도입한다.

Figure 1: Survey responses from faculty of the top-10 ranked universities around the world. A , The percentage of faculty who consider citations when evaluating candidates (blue) and those who do not (red). B , Solid bars indicate, out of those who self-report considering citations when evaluating c

실험 결과

연구 질문

RQ1후보자를 평가하는 교원들 사이에서 Google Scholar를 인용 지표의 주요 출처로 사용하는 비율은 얼마나 되는가?
RQ2Google Scholar 프로필은 인용 구매나 다른 방법으로 조작될 수 있는가?
RQ3인용을 얼마나 구매할 수 있으며 그러한 구매가 논문 간의 인용 패턴에 어떻게 나타나는가?
RQ4간단한 지표(c2-index)가 Google Scholar에서 의심스러운 인용 행위를 식별하는 데 도움이 될 수 있는가?
RQ5인공적으로 생성된 콘텐츠에 직면했을 때 Google Scholar의 moderated 및 indexing 취약성은 무엇인가?

주요 결과

Google Scholar는 평가자들 사이에서 인용 지표의 가장 인기 있는 출처로, 인용을 고려하는 응답자의 60% 이상이 이를 사용한다.
개념 증명은 수 주 내에 가상의 저자에 대해 50개의 인용을 구입할 수 있음을 보여주어 대량 인용 조작이 가능함을 입증한다.
의심스러운 프로필은 갑작스러운 피크 연도 인용 급증, 소수의 논문에서 인용 집중도 높음, 비전통적 원 source(선행 버전) 의 집중적 사용을 보인다.
Scopus에 비해 의심스러운 저자는 피인용 수가 크게 감소(평균 96% 대 43%)해 데이터베이스 간 불일치를 시사한다.
c2-index는 다수의 논문에서 비정상적으로 높은 인용 집중도를 가진 프로필을 식별하며, 보정된 c2-index는 잠재적 조작 위험을 부각시킨다.
원본 논문이 제거된 후에도 Google Scholar에서 인용이 남아 있을 수 있어 강건한 중재 없이 색인화될 수 있음을 시사한다.

Figure 2: A comparative analysis of suspicious authors and their matches. In each plot, red lines and red dots denote suspicious authors, while blue ones denote their matches. A , For the 4 years leading up to an author’s peak citations, the annual number of citations relative to the peak. B , Discr

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.