Skip to main content
QUICK REVIEW

[论文解读] An analysis of the effects of sharing research data, code, and preprints on citations

Giovanni Colavizza, Lauren Cadwallader|arXiv (Cornell University)|Apr 24, 2024
Academic Publishing and Open Access被引用 5
一句话总结

本研究分析开放科学指标——数据共享、代码共享和预印本发布——如何与引用量相关,使用大规模开放获取数据集,发现预印本和在线数据共享与更高的引用相关,而代码共享没有显著影响。

ABSTRACT

Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited to the early sharing of results via preprints and openly sharing outputs such as data and code to make research more reproducible and extensible. Existing evidence shows that adopting Open Science practices has effects in several domains. In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122'000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

研究动机与目标

  • 评估采用开放科学实践(数据共享、代码共享、预印本)是否与出版物的更高引用量相关。
  • 在控制出版、作者和学科因素的前提下,量化各开放科学实践的引用影响。
  • 探索效应在不同学科和数据共享方式中的差异。
  • 提供可复现的方法和数据,以便复制和扩展研究发现。

提出的方法

  • 使用开放科学指标(OSI)数据集(大约12.2万篇来自PLOS 2018–2023的出版物,以及PMC OA子集作为对照)。
  • 以PMC OA Subset作为引用来源,计算出版物和作者层面的引用指标。
  • 将对数转换的出版物引用量作为因变量,回归OSI指标和广泛的控制变量(年份、月份、作者数、参考文献数、h指数均值、期刊,以及ANZSRC分区哑变量)的函数。
  • 估计基础模型和完整回归模型(OLS和稳健性回归),以log(n_cit_tot+1)作为因变量。
  • 将预印本匹配、数据分享/地点/存储库数据、代码分享/地点、以及分区指示变量作为关键自变量。
  • 通过反变换系数(弹性)以百分比形式报告效应。
  • 在模型规格和时间窗(1–3年引用窗口)之间评估稳健性。
  • 公开数据和代码以便复现。

实验结果

研究问题

  • RQ1在控制混杂因素后,开放科学实践(数据共享、代码共享、预印本)是否与出版物的更高引用量相关?
  • RQ2数据共享、代码共享和预印本的效应在学科和数据共享方式上有何差异?
  • RQ3在同时采用多项开放科学实践时,是否存在累积效应?
  • RQ4在PLOS/开放获取背景之外,这些发现的局限性和普遍性是什么?

主要发现

  • 预印本与显著的正向引用优势相关,约为20.2%(±0.7)。
  • 在在线仓库存放数据与引用优势相关,约4.3%(±0.8)。
  • 分享代码在本样本中未产生统计显著的引用优势。
  • 效应具有累积性:同时具备预印本和在线数据共享的论文引用量大约增加24.5%。
  • 学科差异明显,在各分区之间效应的幅度和存在性存在显著差异。
  • 模型解释了相当一部分方差(全模型的R2约为0.426)。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。