QUICK REVIEW

[論文レビュー] On the Sample Complexity of the Linear Quadratic Regulator

Sarah Dean, Horia Mania|arXiv (Cornell University)|Oct 4, 2017

Machine Learning and Algorithms参考文献 50被引用数 103

ひとこと要約

本論文は、未知のダイナミクスを持つLQRのためのCoarse-ID制御を提案し、最小二乗法による粗い系識別、不確実性の定量化、System Level Synthesisによるロバスト制御を組み合わせて、データ効率の高い保証を伴う安定化制御器を実現します。

ABSTRACT

This paper addresses the optimal control problem known as the Linear Quadratic Regulator in the case when the dynamics are unknown. We propose a multi-stage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a convex optimization problem. We provide end-to-end bounds on the relative error in control cost that are nearly optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.

研究の動機と目的

未知のダイナミクスを持つ線形二次レギュレータの安全でデータ効率の高い学習を動機づける。
システム識別とロバスト制御器設計を結びつけるCoarse-ID制御フレームワークを提案する。
有限サンプルでの推定誤差と閉ループ性能に関する非漸近的な保証を提供する。
Coarse-IDが素朴な方法では失敗するような状況でも安定化する制御器を実現することを実験的に示す。

提案手法

独立したローアウト（rollouts）とガウス励振を用いて、未知のAとBを最小二乗法で推定する。
推定誤差の境界（Â, B̂）をN、システム次元、ノイズレベルの観点から示す（命題1.1）。
ブートストラップを用いてデータ依存の推定誤差境界（Â, B̂）を得る（節2.3）。
推定誤差に関する高確信度境界を用いて摂動ΔA, ΔB上でのロバストLQR問題を定式化する。
System Level Synthesis（SLS）を用いたロバスト合成問題を解き、ロバスト安定性と相対コストギャップの境界を保証する（命題1.2）。
SLS最適化問題に対する有限次元の境界を提供し、シミュレーションで安定化を示す（セクション4–6）。

実験結果

リサーチクエスチョン

RQ1ローアウトから線形システムのダイナミクス（A,B）を学習する際、どのような有限サンプル保証を確立できるか？
RQ2真のダイナミクスが不確かでありながらデータ由来の誤差で境界づけられている場合でも、安定性を保ち性能を発揮する制御器をどのように設計できるか？
RQ3グラム座標（グラム行列）によるシステム励起性と、正確なLQR制御のサンプル複雑性との関係は何か？
RQ4粗識別されたロバスト制御器は、未知のシステムを安定化する際に素朴な確信等価法よりも優れているか？

主な発見

N個の独立ローアウトを用いて(A,B)を推定するためのデータ依存的でほぼ最適なサンプル複雑性境界が得られ、(n+p)および可制御性Gramian項の最小固有値に明示的に依存する。
ブートストラップは、推定ダイナミクスに付随する実用的なデータ駆動の誤差境界εAとεBを提供する。
System Level SynthesisによるロバストLQRの定式化は、高確率で相対コスト境界がO(C_LQR sqrt((n+p) log(1/δ)/N))となる制御器を生み出す。
十分なデータとモデル摂動の境界がある場合、閉ループ系の漸近的安定性を保証する。
数値実験は、素朴な名目設計が十分なデータがあっても不安定になり得る一方、Coarse-ID制御は安定化する制御器を効率的に合成できることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。