[论文解读] AIDE: AI-Driven Exploration in the Space of Code
AIDE 是一个由大型语言模型驱动的代理,将机器学习工程视为代码空间优化,通过代码解的树搜索实现,在 Kaggle 及相关基准测试中取得不错的结果。在表格化的 Kaggle 任务中,它的表现优于若干基线,在许多情况下达到人类中位数水平。
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.
研究动机与目标
- Motivate automating machine learning engineering to reduce laborious trial-and-error.
- Frame ML engineering as code-space optimization to leverage LLMs for targeted improvements.
- Develop a tree-based exploration strategy that reuses and refines promising solutions.
- Provide a concrete instantiation for ML tasks and evaluate against Kaggle-based benchmarks.
提出的方法
- Model the search as optimizing over a space of code scripts with a stateless objective h(s).
- Maintain a solution tree T where edges denote improvements and nodes are scripts.
- Use a hard-coded search policy π to decide which node to refine next.
- Apply a three-way coding operator f that can draft, debug, or improve code with LLMs.
- Utilize a summarization operator Σ to keep prompts concise by condensing past context.
- In ML, incorporate data previews and prompt prompts tailored to dataset characteristics.
实验结果
研究问题
- RQ1Can AIDE reliably search the space of code to improve ML model performance within practical compute budgets?
- RQ2Does a tree-structured, incremental improvement approach outperform monolithic, parallel prompting strategies for ML engineering tasks?
- RQ3How does AIDE perform on real-world Kaggle-style tasks compared to AutoML baselines and human experts?
- RQ4To what extent can LLM-driven code space search generalize to other AI R&D tasks beyond tabular ML?
主要发现
| Agent | Model | Exceeds % of humans ↑ | Above Median (%) ↑ |
|---|---|---|---|
| AIDE | GPT-4 Turbo | 51.38 | 50.00 |
| AutoML (H2O) | N/A | 35.34 | 18.75 |
| AutoGPT (Langchain) | GPT-4 Turbo | 32.34 | 0.00 |
| Human with ChatGPT | GPT-4 Turbo | 41.17 | 18.75 |
- On 16 tabular Kaggle tasks (Weco-Kaggle Lite), AIDE with GPT-4 Turbo achieves Exceeds % of humans = 51.38% and Above Median = 50.00%.
- Across full Weco-Kaggle, AIDE averages Exceeds % of Humans = 48.23% and Above Median = 49.21%.
- AIDE generally outperforms H2O AutoML and LangChain AutoGPT on Exceeds % of humans in the Lite benchmark.
- Independent evaluations (MLE-Bench) show AIDE attaining higher medals and valid submissions with iterative refinement, outperforming several baseline agents.
- METR (RE-Bench) tasks indicate AIDE can surpass human experts in short time windows and in some kernel optimization tasks.
- The results demonstrate the effectiveness of a solution-tree, code-space search approach for ML engineering tasks and related AI R&D challenges.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。