QUICK REVIEW

[论文解读] Stagewise Safe Bayesian Optimization with Gaussian Processes

Yanan Sui, Vincent Zhuang|arXiv (Cornell University)|Jun 20, 2018

Gaussian Processes and Bayesian Inference被引用 67

一句话总结

StageOpt 将安全区域扩张和效用优化在安全贝叶斯优化中分离，提供理论保证并在合成测试与脊髓刺激治疗中显示出更优性能。

ABSTRACT

Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function with absolute feedback or preference feedback subject to unknown safety constraints. We develop an efficient safe Bayesian optimization algorithm, StageOpt, that separates safe region expansion and utility function maximization into two distinct stages. Compared to existing approaches which interleave between expansion and optimization, we show that StageOpt is more efficient and naturally applicable to a broader class of problems. We provide theoretical guarantees for both the satisfaction of safety constraints as well as convergence to the optimal utility value. We evaluate StageOpt on both a variety of synthetic experiments, as well as in clinical practice. We demonstrate that StageOpt is more effective than existing safe optimization approaches, and is able to safely and effectively optimize spinal cord stimulation therapy in our clinical experiments.

研究动机与目标

在不确定性下激发安全的序贯优化，其中每一步的决策都必须是安全的。
将未知的效用函数和安全函数建模为带有 RKHS 边界的高斯过程。
提出 StageOpt，分别扩展安全区域并在安全约束内最大化效用。
提供有限时间内的安全性满足和收敛到最优解的理论保证。
通过合成实验和临床脊髓刺激应用展示有效性。

提出的方法

将效用和安全函数建模为带有界 RKHS 范数和 Lipschitz 连续的安全函数的高斯过程。
定义两阶段的 StageOpt：先使用置信界和可达性扩展安全区域，然后在扩展的安全区域内使用 GP-UCB 最大化效用。
使用保守的置信界 C_t^i，使其与先验界限和当前观测相交，以确保扩展过程中的安全。
通过安全集更新 S_t 和扩展集 G_t 计算扩展，按最大预测不确定性选择扩展器。
在优化阶段，在扩展后的安全区域内通过 GP-UCB 选择 x_t；允许对抗性反馈自适应（见附录 B）。
给出理论结果：定理1 保证 epsilon-可达的安全区域扩张；定理2 保证在安全区域内的 zeta-最优效用。

实验结果

研究问题

RQ1StageOpt 能否在有限时间内安全地将初始安全区域扩展为 epsilon 可达集合？
RQ2StageOpt 是否能在扩展后的安全区域内在有限时间内达到 zeta-最优的效用值？
RQ3将安全扩张与优化分离是否提升在不同安全与效用尺度下的效率和适用性？
RQ4StageOpt 相对于 SafeOpt 和 CEI 在合成和临床环境中的表现如何？

主要发现

StageOpt 在有限界内以高概率将安全区域扩展为 epsilon 可达集合。
StageOpt 在安全区域内以高概率在有限界内达到 zeta-最优的效用值。
StageOpt 让安全区域扩张至少与 SafeOpt 同速，在优化过程中常常识别出更高效用的点。
与 SafeOpt 和 CEI 相比，StageOpt 在跨越不同安全约束的合成实验中展现出更强的经验性能。
在优化脊髓刺激的临床实验中，StageOpt 安全地探索了更大的安全区域，发现的刺激策略优于医生建议。
该框架为基于 GP 的安全贝叶斯优化中的安全性与优化提供了定制的理论保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。