QUICK REVIEW

[论文解读] The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange|ArXiv.org|Apr 10, 2025

Scientific Computing and Data Management被引用 14

一句话总结

AI Scientist-v2 自动生成创意、设计并在一个具备主体性树搜索框架的代理框架下运行实验，撰写手稿，并实现一篇 AI 生成论文在同行评审工作坊的接受。它消除对代码模板的依赖，并使用 VLM 反馈来完善图形和内容。

ABSTRACT

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

研究动机与目标

Demonstrate fully autonomous, end-to-end AI-driven scientific discovery from hypothesis to manuscript.
Eliminate reliance on human-authored code templates to enable domain-general deployment.
Introduce an experiment progress manager and agentic tree-search to deepen exploration of hypotheses.
Incorporate Vision-Language Models for feedback on experiments and manuscript figures/text.
Evaluate the system by submitting AI-generated manuscripts to an ICLR workshop and analyze limitations.

提出的方法

Proposes a domain-general, tree-based exploration that generates and refines Python experiment code without human templates.
Implements an Experiment Progress Manager coordinating four stages: Preliminary Investigation, Hyperparameter Tuning, Research Agenda Execution, and Ablation Studies.
Uses parallelized agentic tree search to generate, execute, and critique multiple nodes, with buggy/non-buggy classifications guiding refinement.
Integrates Vision-Language Models to critique generated figures and captions during experiments and in the manuscript review stage.
Leverages Hugging Face datasets and literature tools (e.g., Semantic Scholar) for dataset loading and literature grounding.
Single-pass manuscript generation with a subsequent reflection stage powered by reasoning models, plus VLM-aided refinement of figures and text.

实验结果

研究问题

RQ1Can a fully autonomous AI system generate research hypotheses and execute experiments across machine learning domains without human-coded templates?
RQ2Does agentic tree search enable deeper exploration of complex hypotheses compared to linear, template-based workflows?
RQ3To what extent can an AI-generated manuscript pass peer review in a workshop setting, and what are the limitations?
RQ4How does Vision-Language Model feedback improve the quality and clarity of figures and manuscript content?

主要发现

Three autonomous manuscripts were submitted to an ICLR workshop; one achieved an average reviewer score of 6.33, placing it in the top ~45% of submissions.
The AI-generated workshop paper on compositional regularization received a 6, 7, and 6 in peer review, and would have been accepted after meta-review.
The study demonstrates that fully AI-generated manuscripts can reach workshop-level acceptance, marking a milestone in autonomous scientific discovery.
Internal evaluation noted limitations such as occasional citation hallucinations and lack of main-conference rigor.
The authors opened-source the code and dataset for community exploration and safety discussion.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。