QUICK REVIEW

[论文解读] Scalable Global Optimization via Local Bayesian Optimization

David Eriksson, Michael Pearce|arXiv (Cornell University)|Oct 3, 2019

Advanced Bandit Algorithms Research被引用 144

一句话总结

本文提出 TuRBO，一种基于局部模型的贝叶斯优化框架，使用具有独立局部高斯过程的多个信任区域和一个隐式 Bandit 策略在全局层面分配采样，从而在高维、昂贵的黑箱函数上提升性能。

ABSTRACT

Bayesian optimization has recently emerged as a popular method for the sample-efficient optimization of expensive black-box functions. However, the application to high-dimensional problems with several thousand observations remains challenging, and on difficult problems Bayesian optimization is often not competitive with other paradigms. In this paper we take the view that this is due to the implicit homogeneity of the global probabilistic models and an overemphasized exploration that results from global acquisition. This motivates the design of a local probabilistic approach for global optimization of large-scale high-dimensional problems. We propose the $ exttt{TuRBO}$ algorithm that fits a collection of local models and performs a principled global allocation of samples across these models via an implicit bandit approach. A comprehensive evaluation demonstrates that $ exttt{TuRBO}$ outperforms state-of-the-art methods from machine learning and operations research on problems spanning reinforcement learning, robotics, and the natural sciences.

研究动机与目标

动机：对高维、昂贵的黑箱函数进行全局优化，并解决全局代理模型的局限性。
提出一个可扩展的局部贝叶斯优化框架，能够处理异质性和高维性，同时避免过度探索。
在机器人、强化学习、宇宙学和合成基准测试中展示 TuRBO 的实证优越性。

提出的方法

维护多个局部高斯过程代理，每个在其自己的信任区域（TR）内运行。
通过成功/失败计数动态调整 TR 大小，以平衡探索与开发（利用）。
使用汤普森采样在 TR 内部及跨 TR 选择批次候选，从而实现隐式的多臂 Bandit 风格的全局分配。
将每个 TR 视为独立的 Bandit 动臂，引导样本分配朝向有前景的区域。
在多样任务中与包括 BO 变体、CMA-ES、随机搜索在内的大量基线进行比较。

实验结果

研究问题

RQ1一组局部概率模型结合隐式 Bandit 分配是否能够在高维、昂贵的函数上超越全球代理模型？
RQ2动态信任区域大小调整与并行局部搜索是否能够在实际中实现可扩展、鲁棒的全局优化？
RQ3在真实世界任务中，TuRBO 与最先进的贝叶斯优化、进化策略和随机优化相比如何？
RQ4批量大小对 TuRBO 的就时间效率和解质量有何影响？
RQ5在预测准确性和超参数可学习性方面，局部模型是否优于单一全局模型？

主要发现

TuRBO 在机器人、强化学习和自然科学领域持续找到优秀解，并且通常优于基线方法。
随着批量大小增加，观测到线性加速，同时不牺牲解的质量。
局部 GP 提供了比单一全局 GP 更好的预测性能和更灵活的超参数设定。
多个小型 TR 捕捉多模态性和多样的最优解，使得通过类似 Bandit 的分配实现有效的全局探索。
多区域（m>1）的 TuRBO 通常优于单区域变体，尤其在高维问题上。
大批量实验在保持解质量的同时实现近线性就地时效提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。