QUICK REVIEW

[论文解读] Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems

Tony Feng, Trieu Trinh|arXiv (Cornell University)|Jan 29, 2026

Mathematics, Computing, and Information Processing被引用 2

一句话总结

论文报告了在 Gemini Deep Think 之上使用 AI 驱动的数学研究代理 Aletheia，对 700 个 Erdős 开放问题进行案例研究，结果呈现出自主、部分和文献识别的混合产出，经过慎重的人类验证并讨论潜在问题。

ABSTRACT

We present a case study in semi-autonomous mathematics discovery, using Gemini to systematically evaluate 700 conjectures labeled 'Open' in Bloom's Erdős Problems database. We employ a hybrid methodology: AI-driven natural language verification to narrow the search space, followed by human expert evaluation to gauge correctness and novelty. We address 13 problems that were marked 'Open' in the database: 5 through seemingly novel autonomous solutions, and 8 through identification of previous solutions in the existing literature. Our findings suggest that the 'Open' status of the problems was through obscurity rather than difficulty. We also identify and discuss issues arising in applying AI to math conjectures at scale, highlighting the difficulty of literature identification and the risk of ''subconscious plagiarism'' by AI. We reflect on the takeaways from AI-assisted efforts on the Erdős Problems.

研究动机与目标

在 Bloom’s Erdős Problems 数据库上展示大规模的半自治数学发现。
通过人工评估评估 AI 生成解决方案的准确性、新颖性与来源。
识别局限性、风险以及 AI 辅助数学研究的最佳实践。

提出的方法

使用基于 Gemini Deep Think 的数学研究代理 Aletheia，为未解决问题生成候选解。
应用自然语言验证器从 700 个提示中筛选出潜在正确答案，得到 212 个候选。
由人类专家评估正确性与新颖性，在需要时进行外部咨询。
将结果分类为自主解决、部分 AI 解决、独立再发现、文献识别。
记录并反思潜意识抄袭以及文献识别挑战的案例。

实验结果

研究问题

RQ1AI 驱动的自然语言验证是否能够可靠地将大规模未解决问题空间缩小为便于专家评审的可处理集合？
RQ2AI 生成候选中有多少比例是正确、具新颖性，或仅仅是对现有文献的重述？
RQ3在大规模 AI 辅助数学发现中，主要障碍（命题解释、文献检索、抄袭风险）有哪些？
RQ4在 Erdős 问题上的自治 AI 解决方案是否达到有意义的数学新颖性，还是仅仅是较低层次的见解？
RQ5与形式化验证方法（如 Lean）相比，AI 辅助工作在这些问题上的表现如何？

主要发现

经评估的 200 个 AI 生成回答中，意义上正确且具新颖性的仅 0–2 个，总体识别出 13 个具有意义的正确解。
自主解决包括五个问题（Erdős-652、Erdős-654、Erdős-935、Erdős-1040、Erdős-1051）。
另外有八个问题产生了部分 AI 解决方案或仅在多部分问题中需要 AI 辅助发现部分。
对于三个问题（Erdős-397、Erdős-659、Erdős-1089）发生了独立再发现，其中文献中已包含正确解。
文献识别指出五个问题（Erdős-333、Erdős-591、Erdős-705、Erdős-992、Erdős-1105）已有文献中的解。
研究强调了在将 AI 应用于大规模数学猜想时潜在风险，如潜意识抄袭与文献识别困难。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。