QUICK REVIEW

[논문 리뷰] The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange|ArXiv.org|2025. 04. 10.

Scientific Computing and Data Management인용 수 14

한 줄 요약

AI Scientist-v2 자율적으로 아이디어를 생성하고, 설계하며, 에이전트적 트리 탐색 프레임워크로 실험을 수행하고, 원고를 작성하며, AI 생성 논문으로 동료 심사를 거친 워크숍 수락을 달성한다. 코드 템플릿 의존성을 제거하고 VLM 피드백을 사용하여 도형과 내용을 다듬는다.

ABSTRACT

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

연구 동기 및 목표

가설에서 원고에 이르는 완전한 자율적이고 엔드-투-엔드 AI 주도 과학적 발견을 시연한다.
도메인 일반화 가능한 배치를 가능하게 하기 위해 인간이 작성한 코드 템플릿에 대한 의존성을 제거한다.
가설 탐색의 심화를 위한 실험 진행 관리자와 에이전트적 트리 탐색 도입.
실험 및 원고 도표/텍스트에 대한 피드백을 위해 Vision-Language Models(VLM)을 도입한다.
ICLR 워크숍에 AI 생성 원고를 제출하여 시스템을 평가하고 한계를 분석한다.

제안 방법

인간 템플릿 없이 Python 실험 코드를 생성하고 다듬는 도메인 일반화된 트리 기반 탐색을 제안한다.
예비 조사, 하이퍼파라미터 튜닝, 연구 의제 실행, 그리고 소거 연구의 네 단계를 조정하는 Experiment Progress Manager를 구현한다.
병렬화된 에이전트적 트리 탐색을 사용하여 다수의 노드를 생성, 실행, 비판하고, 버그 여부(buggy / non-buggy) 분류가 정제를 안내한다.
실험 중 및 원고 검토 단계에서 생성된 도표와 캡션을 비판하기 위해 Vision-Language Models를 통합한다.
데이터셋 로딩과 문헌 기반화를 위해 Hugging Face 데이터셋과 문헌 도구(예: Semantic Scholar)를 활용한다.
추론 모델에 의해 구동되는 반성 단계가 포함된 단일 패스 원고 생성을 수행하고, 도표와 텍스트의 VLM 보조 정제를 더한다.

실험 결과

연구 질문

RQ1완전 자율적인 AI 시스템이 인간이 작성한 템플릿 없이 기계 학습 도메인 전반에서 연구 가설을 생성하고 실험을 수행할 수 있는가?
RQ2에이전트적 트리 탐색이 선형적이고 템플릿 기반 워크플로우에 비해 복잡한 가설의 더 깊은 탐색을 가능하게 하는가?
RQ3AI 생성 원고가 워크숍 환경에서 동료 심사를 얼마나 통과할 수 있으며 한계는 무엇인가?
RQ4Vision-Language Model 피드백이 도표와 원고 콘텐츠의 품질과 명료성을 어떻게 향상시키는가?

주요 결과

세 편의 자율적 원고가 ICLR 워크숍에 제출되었고; 그 중 한 편은 평균 심사자 점수 6.33을 받아 제출작의 상위 약 45%에 해당했다.
구성적 규칙화에 관한 AI 생성 워크숍 논문은 동료 평가에서 6, 7, 6을 받았고 메타 리뷰 후에 수락되었을 것이다.
본 연구는 완전 AI 생성 원고가 워크숍 수준의 수락에 도달할 수 있음을 입증하여 자율 과학 발견의 획기적인 이정표를 남긴다.
내부 평가는 가끔의 인용 오류와 메인 컨퍼런스 수준의 엄밀성 부족과 같은 한계를 언급했다.
저자들은 커뮤니티 탐색과 안전 논의를 위한 코드와 데이터셋의 오픈소스를 공개했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.