QUICK REVIEW

[論文レビュー] M2F: Automated Formalization of Mathematical Literature at Scale

Zichen Wang, Wanli Ma|arXiv (Cornell University)|Feb 19, 2026

Mathematics, Computing, and Information Processing被引用数 0

ひとこと要約

M2Fは長文の数理テキストを Lean へエンドツーエンドで形式化することを2段階の VeriRefine パイプラインで自動化し、プロジェクト規模の高い組み立て可能性と堅牢な証明修復性能を実現する。

ABSTRACT

Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring that entire projects compile end-to-end. We present M2F (Math-to-Formal), the first agentic framework for end-to-end, project-scale autoformalization in Lean. The framework operates in two stages. The statement compilation stage splits the document into atomic blocks, orders them via inferred dependencies, and repairs declaration skeletons until the project compiles, allowing placeholders in proofs. The proof repair stage closes these holes under fixed signatures using goal-conditioned local edits. Throughout both stages, M2F keeps the verifier in the loop, committing edits only when toolchain feedback confirms improvement. In approximately three weeks, M2F converts long-form mathematical sources into a project-scale Lean library of 153,853 lines from 479 pages textbooks on real analysis and convex analysis, fully formalized as Lean declarations with accompanying proofs. This represents textbook-scale formalization at a pace that would typically require months or years of expert effort. On FATE-H, we achieve $96\%$ proof success (vs.\ $80\%$ for a strong baseline). Together, these results demonstrate that practical, large-scale automated formalization of mathematical literature is within reach. The full generated Lean code from our runs is available at https://github.com/optsuite/ReasBook.git.

研究の動機と目的

固定環境下で教科書規模または論文規模の数学内容を Lean プロジェクトとしてビルド可能にする。
宣言からソーススパンへの出所リンク付きのエンドツーエンド形式化を可能にする。
進捗がツールチェーンの改善時のみコミットされる verifier-certified refinement ループ（VeriRefine）を開発する。
複数百ページ・大規模 Lean ライブラリへのスケーラビリティを証明の整合性を保ちながら実証する。
実データセットとベンチマークデータに対してベースラインおよび外部プロヴァーとの定量評価を提供する。

提案手法

2段階パイプライン: Stage 1（statement compilation）は内容を Lean のスケルトン（プレースホルダを含む）へ変換し、プロジェクトが詳述されるまで修復する。
Stage 2（proof repair）は、ステートメント署名を固定したまま、ゴール条件付きの局所編集で残りのホールを修正する。
VeriRefine: Lean ツールチェーンのフィードバックが客観的改善を示す時のみ編集をコミットする受け入れ/撤回の原始操作。
Pinned Lean 環境が再現可能なビルドと診断主導の受容を保証。
パッチの提案は LLM や他のエージェントから来る可能性があるが、確認は厳密にツールチェーン主導。

Figure 1 : The M2F pipeline for project-scale automated formalization.

実験結果

リサーチクエスチョン

RQ1固定環境下で長文の数学ソースを一貫したエンドツーエンドの Lean プロジェクトへ変換できるか。
RQ2プロジェクト規模での検証者主導の洗練ループが後戻りなしに全ての証明ホールを閉じるのにどの程度効果的か。
RQ3一致したステートメントの評価が、基準プロヴァーと比較して証明成功に与える影響はどの程度か。
RQ4Stage 2 の性能は標準化ベンチマーク（例：FATE-H）で最新の証明システムと比べてどうか。

主な発見

Blocks	PB	Files	Decls	LoC	Holes	Closed	PSR (%)
Real Analysis	416	Yes	49	1195	34327	339	100
Convex Analysis (Sec. 1–15)	560	Yes	164	2620	105682	499	100
Paper	67	Yes	28	301	13844	37	100
Total (long-form)	1043	Yes	241	4116	153853	875	100

M2F は教科書の 479 ページから構成される Lean ライブラリをビルド可能な形で出力し、ファイル数 241、宣言数 4,116、Lean コード行数 153,853 を含む。
Stage 1 は Real Analysis、Convex Analysis、Paper の各領域で 100% の statement-compile coverage (SCC) を達成し、平均修復ラウンド数は低く（ARR ~0.08–0.42）である。
Stage 2 は長文コーパス全体で一致したステートメントに対して 100% の証明成功率（PSR = 100%）を達成。
外部ベンチマーク FATE-H では Stage 2 が一致したステートメントで 96% の PSR を達成し、Seed-Prover 1.5 の 80% を上回る。
軽度の監督条件（+31 decl lemma map）を付与した Stage 2 は FATE-H で 97% の PSR を達成。

Figure 3 : FATE-H per-problem code length (non-empty lines) and outcome category. Colors indicate outcome: green = solved automatically, yellow = solved with lemma-map supervision, red = unsolved; the single wrong-statement instance is shown with a distinct style (see § 6.4 ).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。