QUICK REVIEW

[論文レビュー] Mapping the Increasing Use of LLMs in Scientific Papers

Weixin Liang, Yaohui Zhang|arXiv (Cornell University)|Apr 1, 2024

Library Science and Information Systems被引用数 38

ひとこと要約

本論文は 2020–2024 の arXiv, bioRxiv, Nature ポートフォリオの論文の要約と序論における LLM で修正された内容の人口レベルの割合を推定し、ChatGPT 後に急速に増加することを示す。CS が先行し、数学/Nature ポートフォリオが後れを取る。

ABSTRACT

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

研究の動機と目的

複数のプラットフォームにわたる科学的執筆の人口レベルでの AI 修正（LLM 修正）内容の普及度を定量化する。
2020年から2024年までの LLM の利用の時系列動向を追跡し、分野別・会場別の動態を理解する。
事前投稿活動、分野の過密度、論文の長さなど、LLM 使用が高いと関連する要因を特定する。
文書ごとの分類に依存せず、LLM 修正を人口レベルで推定するスケーラブルなフレームワークを開発・検証する。

提案手法

要約と導入部の文における LLM 修正内容の割合を推定するために、分布的 GPT 定量化フレームワークを適用する。
トークン集合 T とそれらの出現確率 p_t および q_t を用いて、人間が書いたテキストと LLM-modified テキストのトークンレベル分布をモデル化する。
既知の人間が書いた文書と LLM-modified 文書のコレクションから p_t および q_t を推定する。
混合モデル D_alpha の下で対数尤度を最大化して、 hat{P}_T および hat{Q}_T のパラメータを用い、AI 修正分 α を推定する。

実験結果

リサーチクエスチョン

RQ12020年から2024年の間に、arXiv、bioRxiv、Nature ポートフォリオ論文の科学的要約と導入部における LLM 修正コンテンツの人口レベルの普及率はどの程度か。
RQ2異なる分野で LLM 修正の普及率は時間とともにどう変化し、どの会場で最も顕著な成長を示すのか。
RQ3著者・分野・論文レベルの要因が、科学執筆における高い LLM 使用とどのように関連するのか。
RQ4文書レベルのラベリングに依存せず、時系列分布の変化下で人口レベルの推定フレームワークは LLM 修正コンテンツを頑健に検出できるか。

主な発見

LLM 修正コンテンツの着実な増加が観察され、コンピュータサイエンスで最大の成長を示す（2024年2月時点で要約は α が最大 17.5%、導入は 15.3%）。
数学論文と Nature ポートフォリオは最も小さい増加を示す（要約は最大 4.9% および 6.3%、導入は最大 3.5% および 6.4%）。
初著者がより多くのプレプリントを投稿した論文ほど LLM 修正が多い（例: CS 要約 19.3% vs 15.6%、プレプリント数 >=3 対 <=2）。
最近の仲間論文により類似している論文ほど LLM 使用が多い（CS 要約 22.2% vs 14.7%、より類似 vs より非類似）。
短い論文ほど長い論文より LLM 使用が多い（CS 要約 17.7% vs 13.6%）。
ChatGPT 前の推定値（2022年11月）は低いベースラインと一致（CS 要約 2.3%、EE&SS 2.9%、数学 2.4%、Nature 3.1%）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。