QUICK REVIEW

[論文レビュー] Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Dmitry Kobak, Rita González Márquez|arXiv (Cornell University)|Jun 11, 2024

Artificial Intelligence in Healthcare and Education被引用数 34

ひとこと要約

本論文は過剰語彙の使用を利用した偏りのないデータ駆動型アプローチを導入し、LLM支援執筆を生物医学要約において定量化する。2024年のPubMed要約の少なくとも10％が、ChatGPTのようなLLMで処理されたと推定され、サブコーパスによってはそれ以上である。

ABSTRACT

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

研究の動機と目的

基準となるプロンプトや検出器がない状態で、科学的執筆におけるLLMの影響を測定する。
2010–2024年の14.4百万件のPubMed要約における過剰語彙使用パターンを特定する。
2024年にChatGPT類の執筆ツールが文体と語彙をどのように変えたかを定量化する。

提案手法

PubMed要約から14.4M × 2.4Mの語出現行列を構築する。
観測された2024年の頻度と反事実の2021–22推定値（p, q, r, delta）を用いて過剰語を定義する。
829語の過剰語を内容語かスタイル語として注釈付けし、品詞を分類する。
分野、国、ジャーナル別のサブグループの変動を分析する。
語群間の頻度ギャップからLLM使用の下限を算出する。

Figure 1: Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison.

実験結果

リサーチクエスチョン

RQ1科学論文要約における過剰語の使用は、真偽付きのラベリングなしにLLM支援執筆を明らかにできるか。
RQ22024年の過剰語の影響は、分野・国・ジャーナルを超えてどれくらい大きいか。
RQ3スタイル語はLLM影響執筆において内容語と異なるパターンを示すか。
RQ4LLM誘発執筆は、Covid-19語彙急増のような過去の語彙変化とどのように比較されるか。

主な発見

2024年に過剰語が出現し、スタイル語（動詞と形容詞）の増加が著しく見られた一方、Covid時代の内容語は増加しなかった。
本研究は、2024年の要約の少なくとも10％がLLM処理され、いくつかのサブコーパスでは下限が30％に達する可能性があると推定。
2つの語群（すべての過剰語と、重複のない10語セット）から、LLM使用の下限はおおよそ11–12％程度という類似の結果が得られる。
分野別・国別の異質性は顕著で、計算分野や一部の非英語圏の国で下限が高くなる。
検出が多いジャーナルと出版社（例：MDPI、Frontiers）は過剄用が大きく、Nature/Science/Cellは下限が低い。
本分析は、過去の語彙変化と比較して、LLM影響執筆を質・量の両面で前例のないものとして位置付ける。

Figure 2: Words showing increased frequency in 2024. (a) Frequencies in 2024 and frequency ratios ( $r$ ). Both axes are on log-scale. Only a subset of points are labeled for visual clarity. The dashed line shows the threshold defining excess words (see text). Words with $r>90$ are shown at $r=90$ .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。