QUICK REVIEW

[논문 리뷰] Towards Autonomous Mathematics Research

Tony Feng, Trieu Trinh|arXiv (Cornell University)|2026. 02. 10.

Advanced Graph Neural Networks인용 수 2

한 줄 요약

이 논문은 자연어로 증명을 반복적으로 생성, 검증 및 수정하는 수학 연구 에이전트인 Aletheia를 소개하며, 자율 AI-수학 결과 및 투명성 자율성 분류체계를 보여준다.

ABSTRACT

Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest quantifying standard levels of autonomy and novelty of AI-assisted results, as well as propose a novel concept of human-AI interaction cards for transparency. We conclude with reflections on human-AI collaboration in mathematics and share all prompts as well as model outputs at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

연구 동기 및 목표

경쟁 수준의 문제 해결과 전문 수학 연구 간의 격차를 해소한다.
Gemini Deep Think 위에 엔드-투-엔드 수학 연구 에이전트(Generator-Verifier-Reviser)를 개발한다.
추론 시점 규모 확장과 광범위한 도구 사용을 활용하여 박사 학위 수준의 수학 문제에 도전한다.
AI 지원 이정표를 시연한다: 자율 AI 논문, 인간-AI 협력, 그리고 Erdős 문제 평가.
AI-생성 수학에서 자율성과 참신성을 정량화하기 위한 지표와 분류체계를 제안한다.

제안 방법

Gemini Deep Think 위에 Generator, Verifier, Reviser의 세 서브에이전트로 Aletheia를 구축한다.
형식 언어가 아닌 자연어로 엔드-투-엔드로 작동한다.
올림피아드 수준 및 PhD 수준 문제에 대한 추론 시점 규모 법칙을 실험한다.
문헌과 인용을 탐색하기 위해 광범위한 도구 사용(Google 검색, 웹 탐색)을 활용한다.
출력물에 대한 전문가의 인간 채점을 도입하고 자율성/참신성 분류체계를 개발한다.
투명성을 위해 프롬프트와 모델 출력을 문서화한다.

실험 결과

연구 질문

RQ1AI가 연구 규모에서 자율적으로 새로운 수학 정리를 발견하고 증명할 수 있는가?
RQ2추론 시점 규모 확장과 도구 사용이 올림피아드 수준의 추론을 박사 수준의 수학으로 얼마나 확장시킬 수 있는가?
RQ3AI가 생성한 수학적 결과의 신뢰성, 참신성, 그리고 투명성은 어떤가?
RQ4수학 연구에서 자율성 수준과 인간-AI 상호작용을 어떻게 정량화할 수 있는가?

주요 결과

Aletheia는 IMO-ProofBench Advanced에서 전체 95.1%를 달성하고 해결 문제에 대해 조건부 정확도 98.3%를 보인다.
FutureMath Basic(PhD 수준)에서 Aletheia는 비슷한 계산량의 기준선보다 우수하나 더 긴 추론 작업에서 더 많은 실수와 망상(hallucinations)을 보인다.
AI가 eigenweights(Feng2026)에 관한 완전한 AI-생산 논문을 생성했고 독립집합 경계에 대한 AI-주도 협업을 가능하게 했다(LeeSeo2026).
광범위한 Erdős 문제 연구는 AI가 자율적이거나 부분적이거나 문헌에서 확인된 결과를 낼 수 있음을 보이며, 평가된 200개 후보 중 13개는 의미상으로 올바른 해를 제시했다.
도구 사용(웹 검색)은 인용 망상을 줄이고, 표준 Python 도구는 추가 이점을 제한적으로 제공한다.
AI 기여도와 자율성 수준에 대한 분류체계를 제안하여 AI 지원 수학을 맥락화한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.