Skip to main content
QUICK REVIEW

[论文解读] AIDE: AI-Driven Exploration in the Space of Code

Zonglin Jiang, David A. Schmidt|ArXiv.org|Feb 18, 2025
AI-based Problem Solving and Planning被引用 3
一句话总结

AIDE 是一个由大型语言模型驱动的代理,将机器学习工程视为代码空间优化,通过代码解的树搜索实现,在 Kaggle 及相关基准测试中取得不错的结果。在表格化的 Kaggle 任务中,它的表现优于若干基线,在许多情况下达到人类中位数水平。

ABSTRACT

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

研究动机与目标

  • Motivate automating machine learning engineering to reduce laborious trial-and-error.
  • Frame ML engineering as code-space optimization to leverage LLMs for targeted improvements.
  • Develop a tree-based exploration strategy that reuses and refines promising solutions.
  • Provide a concrete instantiation for ML tasks and evaluate against Kaggle-based benchmarks.

提出的方法

  • Model the search as optimizing over a space of code scripts with a stateless objective h(s).
  • Maintain a solution tree T where edges denote improvements and nodes are scripts.
  • Use a hard-coded search policy π to decide which node to refine next.
  • Apply a three-way coding operator f that can draft, debug, or improve code with LLMs.
  • Utilize a summarization operator Σ to keep prompts concise by condensing past context.
  • In ML, incorporate data previews and prompt prompts tailored to dataset characteristics.

实验结果

研究问题

  • RQ1Can AIDE reliably search the space of code to improve ML model performance within practical compute budgets?
  • RQ2Does a tree-structured, incremental improvement approach outperform monolithic, parallel prompting strategies for ML engineering tasks?
  • RQ3How does AIDE perform on real-world Kaggle-style tasks compared to AutoML baselines and human experts?
  • RQ4To what extent can LLM-driven code space search generalize to other AI R&D tasks beyond tabular ML?

主要发现

AgentModelExceeds % of humans ↑Above Median (%) ↑
AIDEGPT-4 Turbo51.3850.00
AutoML (H2O)N/A35.3418.75
AutoGPT (Langchain)GPT-4 Turbo32.340.00
Human with ChatGPTGPT-4 Turbo41.1718.75
  • On 16 tabular Kaggle tasks (Weco-Kaggle Lite), AIDE with GPT-4 Turbo achieves Exceeds % of humans = 51.38% and Above Median = 50.00%.
  • Across full Weco-Kaggle, AIDE averages Exceeds % of Humans = 48.23% and Above Median = 49.21%.
  • AIDE generally outperforms H2O AutoML and LangChain AutoGPT on Exceeds % of humans in the Lite benchmark.
  • Independent evaluations (MLE-Bench) show AIDE attaining higher medals and valid submissions with iterative refinement, outperforming several baseline agents.
  • METR (RE-Bench) tasks indicate AIDE can surpass human experts in short time windows and in some kernel optimization tasks.
  • The results demonstrate the effectiveness of a solution-tree, code-space search approach for ML engineering tasks and related AI R&D challenges.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。