QUICK REVIEW

[论文解读] The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

Giuseppe Russo Latona, Antônio H. Ribeiro|arXiv (Cornell University)|May 3, 2024

Artificial Intelligence in Healthcare and Education被引用 10

一句话总结

本研究量化了 AI 辅助同行评审在 ICLR 2024 的普及程度，并发现 AI 辅助评审倾向于提高提交分数和提升接受率，尤其对于边缘论文。

ABSTRACT

Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends

研究动机与目标

评估 AI 辅助同行评审在 ICLR 2024 的普及程度。
估计 AI 辅助评审对提交分数的因果效应。
估计 AI 辅助评审对论文接受率的因果效应。
提供开放数据和代码以复制和扩展分析。

提出的方法

基于 ICLR 2024 的 OpenReview 数据的三项研究准实验设计（n=7,404 提交；n=28,028 条评审）。
研究 1：使用 GPTZero 检测 AI 辅助评审以估计普及率（下界）。
研究 2：对同一提交的 AI 辅助评审与人工评审分数进行比较，以估计分数影响。
研究 3：采用基于内容的匹配和逻辑回归/线性回归的匹配对分析来估计接受率的影响。

实验结果

研究问题

RQ1根据 LLM 检测器，ICLR 2024 的评审中有多少比例是 AI 辅助的？
RQ2对于同一论文，AI 辅助评审是否系统性地给出比人工评审更高的分数？
RQ3在控制其他因素后，AI 辅助评审是否提高提交被接受的概率？
RQ4是否存在异质效应，特别是对边缘提交？

主要发现

AI 辅助评审普遍存在，至少有 15.8% 的评审被归类为 AI 辅助。
在同一提交的比较中，AI 辅助评审在 53.4% 的对比中评分高于人工评审（p=0.002；相对机会+14.4%）。
有 AI 辅助评审的提交被接受的机会提高了 13.8%（p=0.024）或平均接受率高出 3.1 个百分点。
边缘提交（平均人工分数 5–6）显示最强的效应，接受率提高 4.9 个百分点（p=0.024；机会比增加 31.1%）。
在各项分析中，AI 辅助评审被证明会影响分数和接受率，引发对同行评审的信任与公正性的担忧。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。