QUICK REVIEW

[논문 리뷰] Delving into LLM-assisted writing in biomedical publications through excess vocabulary

Dmitry Kobak, Rita González Márquez|arXiv (Cornell University)|2024. 06. 11.

Artificial Intelligence in Healthcare and Education인용 수 34

한 줄 요약

이 논문은 데이터 중심의 편향 없는 접근법을 통해 과잉 단어 사용을 이용해 생물의학 초록의 LLM 보조 작성을 정량화하고, 2024년 PubMed 초록 중 최소 10%(일부 서브코퍼스에서 더 높음) 가 ChatGPT와 같은 LLM으로 처리되었을 것으로 추정합니다.

ABSTRACT

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

연구 동기 및 목표

Ground-truth 프롬프트나 탐지기에 의존하지 않고 LLM의 영향 측정.
2010–2024년 14.4백만 개의 PubMed 초록에서 과잉 어휘 사용 패턴 식별.
ChatGPT 유사 작성 도구가 2024년의 작성 스타일과 어휘를 어떻게 바꾸었는지 정량화.

제안 방법

PubMed 초록으로부터 14.4M × 2.4M 단어 등장 행렬 구성.
관측된 2024 빈도와 반사실적 2021–22 추정치(p, q, r, delta)로 과잉 단어 정의.
829개의 과잉 단어를 내용(content) 또는 스타일(style)로 주석하고 품사 분류.
분야, 국가, 저널별 하위집단 변이 분석.
단어 그룹 간 빈도 차이로부터 LLM 사용의 하한선 계산.

Figure 1: Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison.

실험 결과

연구 질문

RQ1과잉 어휘 사용이 ground-truth 라벨링 없이 LLM 보조 작성을 드러낼 수 있는가?
RQ22024년의 과잉 어휘 발자국이 학문 간, 국가 간, 저널 간에 얼마나 큰가?
RQ3스타일 단어가 LLM 영향 글쓰기에서 콘텐츠 단어와 다른 패턴을 보이는가?
RQ4LLM 유도 글쓰기가 COVID-19 어휘 급증 같은 역사적 변화와 어떻게 비교되는가?

주요 결과

2024년에 과잉 단어가 나타났고, 스타일 단어(동사와 형용사)가 Covid 시대의 콘텐츠 단어와 달리 현격히 증가했다.
연구는 2024년 초록의 최소 10%가 LLM-처리되었을 것으로 추정하며, 일부 서브코퍼스에서 하한선은 최대 30%까지.
두 단어 그룹(모든 과잉 단어와 서로 겹치지 않는 10단어 집합)에서 LLM 사용에 대한 하한이 대략 11–12% 정도로 비슷하게 나온다.
분야 및 국가 차이가 현저하며, 계산 및 일부 비영어권 국가에서 상한선이 더 높다.
높이 탐지된 저널과 발행사(예: MDPI, Frontiers)가 더 큰 과잉 사용을 보이고, Nature/Science/Cell은 하한선이 더 낮다.
분석은 LLM 영향 글쓰기를 이전 어휘 변화에 비해 품질과 양 측면에서 전례 없음을 제시한다.

Figure 2: Words showing increased frequency in 2024. (a) Frequencies in 2024 and frequency ratios ( $r$ ). Both axes are on log-scale. Only a subset of points are labeled for visual clarity. The dashed line shows the threshold defining excess words (see text). Words with $r>90$ are shown at $r=90$ .

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.