Skip to main content
QUICK REVIEW

[论文解读] Towards Autonomous Mathematics Research

Tony Feng, Trieu Trinh|arXiv (Cornell University)|Feb 10, 2026
Advanced Graph Neural Networks被引用 2
一句话总结

这篇论文介绍了Aletheia,一种能迭代生成、验证和修正证明的数学研究代理,在自然语言中工作,展示自主AI在数学领域的成果与透明度自治分类法。

ABSTRACT

Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest quantifying standard levels of autonomy and novelty of AI-assisted results, as well as propose a novel concept of human-AI interaction cards for transparency. We conclude with reflections on human-AI collaboration in mathematics and share all prompts as well as model outputs at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

研究动机与目标

  • 弥合高水平竞赛题解与专业数学研究之间的差距。
  • 在Gemini Deep Think之上开发端到端的数学研究代理(Generator-Verifier-Reviser)。
  • 利用推断时的扩展性和密集工具使用以应对博士级数学问题。
  • 展示AI辅助的里程碑:自主AI论文、人机协作,以及Erdős问题评估。
  • 提出度量指标和分类法以量化AI生成的数学中的自治性与新颖性。

提出的方法

  • 在Gemini Deep Think之上构建Aletheia,包含三个子代理:Generator、Verifier和Reviser。
  • 以自然语言而非形式语言进行端到端操作。
  • 尝试对奥林匹克级和博士级问题建立推断时扩展定律。
  • 大量使用工具(Google Search、网页浏览)以导航文献和引用。
  • 结合专家人类对输出的评分并开发自治/新颖性分类法。
  • 为透明性记录提示和模型输出。

实验结果

研究问题

  • RQ1AI是否能够在研究规模上自主发现并证明新的数学定理?
  • RQ2推断时扩展性加工具使用在多大程度上能将奥林匹克级推理拓展到博士级数学?
  • RQ3AI生成的数学结果的可靠性、新颖性与透明度如何?
  • RQ4如何量化数学研究中的自治级别和人机交互?

主要发现

  • Aletheia在IMO-ProofBench Advanced上的总体得分为95.1%,解决问题的条件准确率为98.3%。
  • 在FutureMath Basic(博士级别)上,Aletheia在可比计算条件下优于基线,但在较长推理任务中出现更多错误和幻觉。
  • AI生成了一篇关于特征权重(Feng2026)的完全AI产出论文,并促成AI引导的对独立集界限的协作(LeeSeo2026)。
  • 一项广泛的Erdős问题研究显示,AI可产生自主、部分或文献识别的结果,在评估的200个候选中有13个显著正确解。
  • 工具使用(网络检索)降低了引用幻觉,而标准的Python工具提供的额外收益有限。
  • 提出了一个用于界定AI贡献与自治水平的分类法,以便于将AI辅助数学置于情境中。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。