QUICK REVIEW

[论文解读] Popularity of arXiv.org within Computer Science

Charles Sutton, Linan Gong|arXiv (Cornell University)|Oct 14, 2017

Data Quality and Management参考文献 8被引用 24

一句话总结

本研究通过分析十年间63个顶级计算机科学会议的元数据，调查了计算机科学研究人员对arXiv.org的采用情况。研究发现，arXiv的使用率从2007年的1%飙升至2017年的23%，在机器学习和理论计算机科学领域采用率超过60%，表明该领域正日益转向预印本共享和集中式电子论文存储库。

ABSTRACT

It may seem surprising that, out of all areas of science, computer scientists have been slow to post electronic versions of papers on sites like arXiv.org. Instead, computer scientists have tended to place papers on our individual home pages, but this loses the benefits of aggregation, namely notification and browsing. But this is changing. More and more computer scientists are now using the arXiv. At the same time, there is ongoing discussion and controversy about how prepublication affects peer review, especially for double-blind conferences. This discussion is often carried out with precious little evidence of how popular prepublication is. We measure what percentage of papers in computer science are placed on the arXiv, by cross-referencing published papers in DBLP with e-prints on arXiv. We found: * Usage of arXiv.org has risen dramatically among the most selective conferences in computer science. In 2017, fully 23% of papers had e-prints on arXiv, compared to only 1% ten years ago. * Areas of computer science vary widely in e-print prevalence. In theoretical computer science and machine learning, over 60% of published papers are on arXiv, while other areas are essentially zero. In most areas, arXiv usage is rising. * Many researchers use arXiv for posting preprints. Of the 2017 published papers with arXiv e-prints, 56% were preprints that were posted before or during peer review. Our paper describes these results as well as policy implications for researchers and practitioners.

研究动机与目标

量化不同子领域中计算机科学研究人员对arXiv.org的使用程度。
确定论文是以预印本（在同行评审前）还是接受后电子版形式发布。
评估预印本文化兴起对双盲评审制度及科研传播规范的影响。
为关于开放同行评审、预出版以及集中式存储库在计算机科学中作用的持续争论提供依据。
为计算机科学出版生态系统的演变提供数据驱动的洞察。

提出的方法

作者收集了2007年至2017年间63个最具选择性的计算机科学会议的元数据。
通过DOI及其他标识符将每篇已发表论文的元数据与arXiv.org匹配，以确定是否存在电子版论文。
若电子版在同行评审前或期间发布，则归类为预印本；若在录用后发布，则归类为后印本。
利用数据的统计摘要，分析不同子领域和时间维度上arXiv采用的趋势。
研究预印本普遍性对双盲评审的影响，特别是作者身份被识别的风险。
利用arXiv、DBLP和会议论文集的现有数据，实现大规模、纵向分析。

实验结果

研究问题

RQ1在顶级计算机科学会议上发表的论文中，有多少比例已在arXiv.org上发布为电子版？
RQ2过去十年中，不同计算机科学子领域arXiv的采用情况如何变化？
RQ3有多少比例的论文作为预印本在同行评审前发布，而非在录用后发布？
RQ4预印本的普遍性如何影响计算机科学中双盲评审的可行性？
RQ5集中式电子论文存储库对科研传播和学术规范有何影响？

主要发现

2017年，发表于最顶尖计算机科学会议的论文中，有23%已在arXiv.org上发布电子版，较2007年的1%显著上升。
在理论计算机科学和机器学习领域，超过60%的已发表论文拥有arXiv电子版，表明这些领域采用率极高。
2017年，拥有arXiv电子版的论文中，56%为预印本，即在同行评审前或期间发布。
在大多数计算机科学子领域，arXiv的使用率持续上升，尽管部分领域仍接近零采用。
研究结果表明，预印本如今已成为计算机科学许多领域的主要传播方式。
本研究强调，亟需修订评审规范，以应对因预印本公开导致的作者身份暴露风险。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。