QUICK REVIEW

[论文解读] Mapping the Increasing Use of LLMs in Scientific Papers

Weixin Liang, Yaohui Zhang|arXiv (Cornell University)|Apr 1, 2024

Library Science and Information Systems被引用 38

一句话总结

该论文在 2020–2024 年对 arXiv、bioRxiv 和 Nature 系列论文的摘要和引言中，估计了 LLM-modified 内容在总体层面的比例，显示在 ChatGPT 之后迅速上升，计算机科学领域领先，而数学/Nature 的论文组合落后。

ABSTRACT

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

研究动机与目标

在跨多个平台的总体层面上，量化科学写作中AI修改内容（LLM-modified）的普遍性。
追踪自2020年至2024年的LLM使用时间趋势，以了解领域和刊物层面的动态。
识别与更高LLM使用相关的因素，如预印本活动、领域拥挤程度和论文长度。
开发并验证一种可扩展的、基于总体层面的LLM修改估计框架，该框架不依赖逐文档分类。

提出的方法

应用分布式GPT量化框架来估计摘要和引言中LLM-modified内容所占的比例。
使用集合Token T及其出现概率p_t和q_t，对人工撰写文本与LLM-modified文本在令牌层面的分布进行建模。
从已知的人类撰写文档和LLM-modified文档中估计p_t和q_t。
通过在混合模型 D_alpha（参数为 hat{P}_T 和 hat{Q}_T）的对数似然下极大化，推断AI修改比例alpha。

实验结果

研究问题

RQ1在2020至2024年间，跨 arXiv、bioRxiv 和 Nature 组合论文的科学摘要和引言中，LLM-modified 内容的总体流行度是多少？
RQ2在不同学科中，LLM-modification 的流行度如何随时间演变，哪些刊物展现出最强增长？
RQ3哪些作者、领域和论文层面的因素与科学写作中更高的LLM使用相关？
RQ4在不依赖逐篇文档标注的情况下，人口规模估计框架是否能在随时间分布变化的情形下稳健地检测LLM-modified 内容？

主要发现

观察到LLM-modified 内容的稳步增长，计算机科学领域增长最大（到2024年2月，摘要的alpha高达17.5%，引言为15.3%）
数学论文和 Nature 系列显示最小的增加（摘要最高4.9%和6.3%，引言最高3.5%和6.4%）。
第一作者发表预印本越多的论文，其LLM-modification 越高（例如，CS摘要在>=3与<=2预印本的对比中分别为19.3%与15.6%）。
与最近同行更相似的论文显示更高的LLM使用（CS摘要22.2%对比更相似与不太相似的14.7%）。
较短的论文比较长的论文显示更高的LLM使用（CS摘要17.7%对13.6%）
在ChatGPT之前（2022年11月）的估计与较低的基线一致（CS摘要2.3%，EE&SS 2.9%，Math 2.4%，Nature 3.1%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。