QUICK REVIEW

[论文解读] PAC-Bayes Control: Synthesizing Controllers that Provably Generalize to Novel Environments

Anirudha Majumdar, Maxwell Goldstein|arXiv (Cornell University)|Jun 11, 2018

Machine Learning and Algorithms被引用 15

一句话总结

本文提出PAC-Bayes控制，一种通过利用PAC-Bayes框架来约束未见环境中的期望成本，从而合成在新环境中具有可证明泛化能力的机器人控制器的方法。该方法将控制器合成问题形式化为一个优化问题，通过凸优化或随机梯度下降最小化该泛化界，在具有深度感知的模拟障碍物避让任务中展示了出色的泛化性能。

ABSTRACT

Our goal is to synthesize controllers for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between robustness of controllers to novel environments and generalization of hypotheses in supervised learning. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds (that hold with high probability) on the expected cost of (stochastic) controllers across novel environments. We propose control synthesis algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite control policy space. In the more general setting of continuously parameterized controllers, we minimize this upper bound using stochastic gradient descent. We present examples of our approach in the context of obstacle avoidance control with depth measurements. Our simulated examples demonstrate the potential of our approach to provide strong generalization guarantees on controllers for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, and rich sensory inputs (e.g., depth measurements).

研究动机与目标

开发一种通过示例环境中的数据，合成在未见环境中可证明泛化的控制器的方法。
通过形式化约化，将控制中的鲁棒性与监督学习中的泛化性相连接。
利用PAC-Bayes框架，推导在新环境中控制器期望成本的高概率上界。
设计基于优化的控制器合成算法，以最小化这些泛化界。
在具有连续状态、动作、非线性动力学以及丰富传感输入（如深度测量）的机器人控制任务上评估该方法。

提出的方法

该方法在控制器对新环境的鲁棒性与监督学习中的泛化性之间建立了一种约化关系。
应用PAC-Bayes框架，推导出在未见环境中随机控制器期望成本的高概率上界。
对于有限策略空间，通过凸优化求解优化问题，具体为相对熵规划（Relative Entropy Programming）。
对于连续参数化的控制器，使用随机梯度下降来最小化泛化界。
该方法将控制器视为一个随机假设，并利用来自示例环境的数据，在泛化约束下对其进行训练。
该框架被应用于基于模拟中深度测量的障碍物避让任务。

实验结果

研究问题

RQ1我们能否利用泛化理论，对控制器在新环境中的期望成本进行形式化上界约束？
RQ2在实际中，如何合成最小化该泛化界的控制器？
RQ3所提出的方法是否能在复杂机器人任务中生成在未见环境中具有良好泛化性能的控制器？
RQ4该方法在具有非线性动力学和丰富传感输入的场景中表现如何？
RQ5该优化问题能否高效地求解于离散和连续控制器参数化的情形？

主要发现

PAC-Bayes框架能够为新环境中控制器的期望成本提供高概率上界。
所提出的合成算法成功地最小化了这些上界，从而生成具有可证明泛化能力的控制器。
对于有限策略空间，该方法通过凸优化（相对熵规划）高效求解优化问题。
对于连续参数化，随机梯度下降能有效最小化泛化界。
模拟结果表明，在具有深度测量的障碍物避让任务中，该方法展现出强大的泛化性能。
该方法适用于具有连续状态和动作空间以及非线性动力学的系统。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。