Skip to main content
QUICK REVIEW

[论文解读] Google Scholar is manipulatable

Hazem Ibrahim, Fengyuan Liu|arXiv (Cornell University)|Feb 7, 2024
Artificial Intelligence in Healthcare and Education被引用 11
一句话总结

研究表明 Google Scholar 可以通过购买引用和伪造个人资料被操控,提供证据表明引用计数可以被购买,并在评估情境中产生误导。

ABSTRACT

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.

研究动机与目标

  • 评估研究人员在招聘和晋升决策中对引用指标依赖的广泛程度。
  • 量化 Google Scholar 在顶尖大学学者中作为引用数据来源的突出性。
  • 识别可疑 Google Scholar 个人资料中的模式及潜在操控技术。
  • 演示购买引用的可行性及其对衡量指标的影响。
  • 提出一种用于标记潜在可疑引用活动的度量(c2-index)。

提出的方法

  • 调查前10所大学的教师,以确定用于引用数据的来源。
  • 整理一个超过160万条 Google Scholar 个人资料的数据集,以分析异常的引用模式。
  • 进行隐蔽实验,为虚构作者购买50条引用,以展示可行性。
  • 创建一个虚构的 Google Scholar 个人资料并上传 AI 生成的文章,以测试审核和索引。
  • 通过第三方服务购买引用,并分析引用论文以寻找批量操控证据。
  • 引入 c2-index(引用集中指数)以标记高度集中的批量引用。
Figure 1: Survey responses from faculty of the top-10 ranked universities around the world. A , The percentage of faculty who consider citations when evaluating candidates (blue) and those who do not (red). B , Solid bars indicate, out of those who self-report considering citations when evaluating c
Figure 1: Survey responses from faculty of the top-10 ranked universities around the world. A , The percentage of faculty who consider citations when evaluating candidates (blue) and those who do not (red). B , Solid bars indicate, out of those who self-report considering citations when evaluating c

实验结果

研究问题

  • RQ1在评估候选者的教师中,使用 Google Scholar 作为主要引用指标来源的普遍程度是多少?
  • RQ2Google Scholar 个人资料是否可以通过购买引用或其他方式被操纵?
  • RQ3引用可以在多大程度上被购买,这类购买在不同论文的引用模式中如何体现?
  • RQ4简单的度量(c2-index)是否有助于识别 Google Scholar 上可疑的引用行为?
  • RQ5面对人工生成的内容时,Google Scholar 的审核和索引漏洞有哪些?

主要发现

  • Google Scholar 是评估者中最受欢迎的引用指标来源,被超过60%的受访者在考虑引用时使用。
  • 概念验证显示,在数周内可以为虚构作者购买50条引用,证明批量引用操控是可行的。
  • 可疑档案表现为突发的峰值年份引用激增、来自少量论文的高集中引用,以及大量使用非传统来源(如预印本)。
  • 与 Scopus 相比,可疑作者的被引计数显著下降(平均为96%对43%),表明数据库之间存在差异。
  • c2-index 识别出来自许多论文的异常高集中引用的档案;调整后的 c2-index 突出了潜在操纵风险。
  • 引用在源论文被移除后仍可在 Google Scholar 中存在,表明索引缺乏强有力的审核。
Figure 2: A comparative analysis of suspicious authors and their matches. In each plot, red lines and red dots denote suspicious authors, while blue ones denote their matches. A , For the 4 years leading up to an author’s peak citations, the annual number of citations relative to the peak. B , Discr
Figure 2: A comparative analysis of suspicious authors and their matches. In each plot, red lines and red dots denote suspicious authors, while blue ones denote their matches. A , For the 4 years leading up to an author’s peak citations, the annual number of citations relative to the peak. B , Discr

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。