Skip to main content
QUICK REVIEW

[论文解读] Does Google Scholar contain all highly cited documents (1950-2013)?

Alberto Martín‐Martín, Enrique Orduña‐Malea|arXiv (Cornell University)|Oct 30, 2014
scientometrics and bibliometrics research参考文献 461被引用 30
一句话总结

本研究通过分析引用次数、文献类型、语言、可及性以及与Web of Science(WoS)的交叉索引,调查了Google Scholar(GS)在1950至2013年间对高被引文献的覆盖全面性。研究发现,GS捕获了大量但不完整的高被引文献,其覆盖范围和引用次数与WoS相比存在显著差异,并识别出文献格式、访问方式及版本检测的关键模式。

ABSTRACT

The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?

研究动机与目标

  • 评估Google Scholar是否全面索引了1950至2013年间的高被引科学文献。
  • 识别GS中高被引文献的核心特征,包括语言、文件格式及开放获取可用性。
  • 评估GS与Web of Science(WoS)在索引高被引文献方面的重叠程度。
  • 分析GS与WoS中引用次数的相关性,以及GS检测到的文献版本数量与引用次数或搜索排名之间的关系。

提出的方法

  • 收集了1950至2013年间每年Google Scholar中引用次数最高的前100篇文献,共形成6,400篇文献的数据集。
  • 从GS和WoS中提取元数据,包括文献类型、语言、文件格式、访问状态及引用次数。
  • 利用URL和内容相似性,识别并分析GS检测到的每篇文献的多个版本。
  • 通过检查GS中每篇高被引文献是否也存在于WoS中,衡量GS与WoS之间的重叠程度。
  • 使用相关性分析,研究GS与WoS中引用次数之间的关系,以及版本数量与引用次数或搜索排名之间的关系。
  • 提供完整的原始数据以供公众访问,以确保结果的可重现性和透明度。

实验结果

研究问题

  • RQ1Google Scholar中引用次数最高的文献有哪些?
  • RQ2Google Scholar中引用次数最高的文献类型是什么?
  • RQ3Google Scholar中高被引文献主要使用哪种语言撰写?
  • RQ4Google Scholar中高被引文献中有多少比例是免费可访问的?其主要文件类型或提供方是什么?
  • RQ5高被引文献在Google Scholar和Web of Science中同时被索引的程度如何?

主要发现

  • Google Scholar索引了大量但不完整的高被引文献,每年GS中引用次数最高的前100篇文献中,仅有63.5%也出现在Web of Science中。
  • GS中引用次数最高的文献类型是期刊文章(64.8%),其次是会议论文(18.7%)和书籍(10.2%)。
  • 在GS中引用次数最高的文献中,英语占主导地位(92.5%),其次是西班牙语(2.1%)及其他语言。
  • GS中仅有47.2%的高被引文献是免费可访问的,其中PDF格式最为常见(78.1%)。
  • GS与WoS中引用次数之间存在中等程度的正相关性(r = 0.58),表明两者部分一致但并非完全对齐。
  • Google Scholar平均检测到每篇高被引文献有3.2个版本,且该版本数量与引用次数(r = 0.41)和搜索排名(r = 0.35)呈正相关。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。