QUICK REVIEW

[论文解读] On the Sample Complexity of the Linear Quadratic Regulator

Sarah Dean, Horia Mania|arXiv (Cornell University)|Oct 4, 2017

Machine Learning and Algorithms参考文献 50被引用 103

一句话总结

本文提出 Coarse-ID 控制用于具有未知动力学的 LQR，将通过最小二乘的粗略系统识别、不确定性定量以及通过 System Level Synthesis 的鲁棒控制相结合，以实现具有数据高效保证的稳定控制器。

ABSTRACT

This paper addresses the optimal control problem known as the Linear Quadratic Regulator in the case when the dynamics are unknown. We propose a multi-stage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a convex optimization problem. We provide end-to-end bounds on the relative error in control cost that are nearly optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.

研究动机与目标

激励对具有未知动力学的线性二次调节器进行安全、数据高效的学习。
提出将系统识别与鲁棒控制器合成耦合的 Coarse-ID control 框架。
提供对估计误差和闭环性能在有限样本下的非渐近保证。
在实验中显示 Coarse-ID 能实现稳定化控制器，而简单方法失败。

提出的方法

使用独立 rollouts 配合高斯激励，通过最小二乘估计未知的 A 和 B。
给出最小二乘估计 (Â, B̂) 的误差界，关于 N、系统维数和噪声水平（Proposition 1.1）。
使用自举法获得对 (Â, B̂) 的数据相关误差界（Section 2.3）。
给出一个基于对估计误差的高概率界的对扰动 ΔA, ΔB 的鲁棒 LQR 问题。
通过 System Level Synthesis (SLS) 求解鲁棒综合问题，以保证鲁棒稳定性并界定相对成本差距（Proposition 1.2）。
给出 SLS 优化问题的有限维界并在仿真中证明稳定化（Sections 4–6）。

实验结果

研究问题

RQ1从 rollouts 学习线性系统的动态 (A,B) 可以建立哪些有限样本保证？
RQ2如何合成在真实动力学不确定但由数据推导误差界限约束下仍保持稳定并表现良好的控制器？
RQ3系统激励性（通过 Gramian）与实现精确 LQR 控制所需样本复杂度之间的关系是什么？
RQ4粗略识别的鲁棒控制器在稳定未知系统方面是否优于天真确定等价方法？

主要发现

得到一个数据相关、近似最优的样本复杂度界，用 N 个独立 rollouts 来估计 (A,B)，明确依赖于 (n+p) 和可控性 Gramian 项的最小特征值。
自举提供实际、数据驱动的误差界 εA 和 εB，随估计的动态一起给出。
通过 System Level Synthesis 的鲁棒 LQR 形式化给出一个相对成本界，高概率下为 O(C_LQR sqrt((n+p) log(1/δ)/N)。
在充分数据和模型扰动界限下，该方法保证闭环系统的渐近稳定性。
数值实验表明，天真名义设计即使数据充足也可能不稳定，而 Coarse-ID 控制能高效合成稳定控制器。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。