QUICK REVIEW

[論文レビュー] The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

Giuseppe Russo Latona, Antônio H. Ribeiro|arXiv (Cornell University)|May 3, 2024

Artificial Intelligence in Healthcare and Education被引用数 10

ひとこと要約

この研究は AI支援のピアレビューがICLR 2024でどれくらい普及していたかを定量化し、AI支援のレビューは提出スコアを上げ、受理率を上げる傾向があり、特に境界論文で顕著であると結論付けている。

ABSTRACT

Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends

研究の動機と目的

ICLR 2024でAI支援のピアレビューがどれくらい普及していたかを評価する。
AI支援のレビューが提出スコアに因果効果を及ぼす程度を推定する。
AI支援のレビューが論文受理率に因果効果を及ぼす程度を推定する。
分析を再現・拡張するための公開データとコードを提供する。

提案手法

ICLR 2024のOpenReviewデータを用いた3研究の準実験デザイン（n=7,404件の提出、n=28,028件のレビュー）。
研究1: GPTZeroを用いてAI支援レビューを検出し、普及率を推定（下限）.
研究2: 同じ提出物に対するAI支援レビューと人間レビューのスコアを比較し、スコアへの影響を推定。
研究3: 内容ベースのマッチングとロジスティック/線形回帰を用いたマッチドペア分析で受理への効果を推定。

実験結果

リサーチクエスチョン

RQ1LLMデテクターによれば、ICLR 2024のレビューの何割合がAI支援だったか？
RQ2同じ論文に対して、AI支援レビューは人間レビューより体系的に高いスコアを付けるか？
RQ3他の要因を考慮しても、AI支援レビューは提出の受理確率を高めるか？
RQ4特に境界提出に対して異質効果はあるか？

主な発見

AI支援レビューは普及しており、少なくとも15.8%のレビューがAI支援と分類された。
同一提出物内での比較では、AI支援レビューは人間レビューより高いスコアを取ったペアは53.4%（p=0.002; 相対オッズ+14.4%）。
AI支援レビューを含む提出は受理オッズが13.8%高く（p=0.024）、平均受理率は3.1ポイント高かった。
境界提出（平均人間スコア5–6）は最も大きな効果を示し、受理が4.9ポイント増加（p=0.024; オッズは31.1%増加）。
分析全体を通じて、AI支援のレビューはスコアと受理の両方に影響を与えることが示され、ピアレビューにおける信頼性と公正性への懸念を高めている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。