QUICK REVIEW
[论文解读] Learning to Learn without Gradient Descent by Gradient Descent
Yutian Chen, Matthew W. Hoffman|arXiv (Cornell University)|Nov 11, 2016
Higher Education Learning Practices被引用 162
一句话总结
本文在合成函数上训练循环神经网络优化器,以实现快速、可迁移的黑箱优化,在包括超参数调优和控制任务在内的多种设置中,与贝叶斯优化方法相媲美,甚至在某些情况下优于它们。
ABSTRACT
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.
研究动机与目标
- Motivate fast, general-purpose black-box optimization beyond Bayesian methods.
- Develop meta-learned optimizers that learn exploration-exploitation trade-offs.
- Demonstrate transfer of learned optimizers to derivative-free problems across domains.
- Show computational gains over standard BO packages in training-horizon scenarios.
提出的方法
- Model a black-box optimizer as an RNN with shared parameters that updates its hidden state and proposes the next query point.
- Train the RNN by backpropagating through time using a loss that sums objective values over a finite horizon (L_sum).
- Experiment with losses that encourage exploration, such as expected improvement (EI) and observed improvement (OI).
- Train function distributions are generated from Gaussian process priors to provide differentiable training signals.
- Extend the framework to parallel evaluations by augmenting inputs with a feedback flag and simulating out-of-order completions.
- Compare learned optimizers to Spearmint, TPE, and SMAC, and evaluate on transfer tasks including GP bandits, control, and hyper-parameter tuning.
- Use differentiable architectures (DNC and LSTM) for the optimizer and assess their speed at test time.
实验结果
研究问题
- RQ1Can a learned RNN-based optimizer, trained on simple synthetic functions, effectively optimize a wide range of black-box functions?
- RQ2Do learned optimizers transfer to derivative-free optimization domains beyond their training distribution?
- RQ3How do different meta-learning losses (sum, EI, OI) influence exploration-exploitation balance and performance?
- RQ4What are the computational advantages of learned optimizers relative to established Bayesian optimization packages?
- RQ5Can parallel evaluation be integrated into the learned optimization framework without performance loss?
主要发现
| Spearmint | TPE | SMAC | DNC | LSTM |
|---|---|---|---|---|
| 1239 | 16.3 | 16.3 | 0.1 | 0.02 |
| 1238 | 16.2 | 16.2 | 0.1 | 0.02 |
| 1524 | 19.3 | 19.3 | 0.1 | 0.02 |
| 2768 | 20.8 | 20.8 | 0.1 | 0.02 |
- Learned RNN optimizers transfer to GP bandits, control objectives, global optimization benchmarks, and ML hyper-parameter tuning.
- DNC-based optimizers trained with EI or OI losses outperform direct-observation DNCs and are competitive with, and often faster than, Spearmint, SMAC, and TPE within a 100-step horizon.
- Optimizers are orders of magnitude faster than traditional BO methods at test time (rough runtime improvements of up to 10^4× in reported cases).
- With higher input dimensions, learned optimizers outperform baseline BO methods in the training horizon.
- Parallel proposal schemes maintain performance while offering substantial speedups in hyper-parameter tuning scenarios.
- The approach achieves competitive results on standard benchmarks and simple control problems, often matching engineered optimizers.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。