QUICK REVIEW

[论文解读] LQR through the Lens of First Order Methods: Discrete-time Case

Jingjing Bu, Afshin Mesbahi|arXiv (Cornell University)|Jul 21, 2019

Adaptive Dynamic Programming Control参考文献 18被引用 76

一句话总结

论文将离散时间 LQR 重新表述为对稳定反馈增益的实值优化，并分析梯度、自然梯度和准牛顿流以及它们的离散化，包括结构化（稀疏性）情况。

ABSTRACT

We consider the Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. Such a setup facilitates examining the implications of a natural initial-state independent formulation of LQR in designing first order algorithms. It is shown that this cost function is smooth and coercive, and provide an alternate means of noting its gradient dominated property. In the process, we provide a number of analytic observations on the LQR cost when directly analyzed in terms of the feedback gain. We then examine three types of well-posed flows for LQR: gradient flow, natural gradient flow and the quasi-Newton flow. The coercive property suggests that these flows admit unique solutions while gradient dominated property indicates that the corresponding Lyapunov functionals decay at an exponential rate; we also prove that these flows are exponentially stable in the sense of Lyapunov. We then discuss the forward Euler discretization of these flows, realized as gradient descent, natural gradient descent and the quasi-Newton iteration. We present stepsize criteria for gradient descent and natural gradient descent, guaranteeing that both algorithms converge linearly to the global optima. An optimal stepsize for the quasi-Newton iteration is also proposed, guaranteeing a $Q$-quadratic convergence rate--and in the meantime--recovering the Hewer algorithm.

研究动机与目标

激励直接在稳定化反馈增益上求解 LQR，使用与初始状态无关的代价函数形式。
建立关于反馈增益的 LQR 成本的光滑性、强制性以及梯度支配性质。
构建并分析三种流动动力学（梯度流、自然梯度流、准牛顿流）及其前向欧拉离散化。
给出无结构与结构化（稀疏）LQR 综合的收敛性保证（线性与二次收敛）以及步长条件。

提出的方法

对固定初始状态定义成本函数 J_x0(K)，然后对多个独立初始状态求和得到可微且无约束目标函数 f(K)。
表明 f(K) 是光滑的、强制性、在稳定集合上解析且梯度支配，从而实现全局收敛结果。
推导并分析三种流（梯度流、自然梯度流、准牛顿流）的连续时间形式及其离散化（梯度下降、自然梯度下降、拟高斯-牛顿型迭代）。
提供基于李雅普诺夫的步长选择，并对无结构 LQR 建立到全局最优的线性收敛，对准牛顿给出二次收敛。
将该框架扩展到结构化 LQR 综合，使用投影梯度下降，并讨论到一阶驻点的亚线性收敛。

实验结果

研究问题

RQ1LQR 合成是否能被有效地表述为在稳定化反馈增益上的优化，且代价对初始状态独立？
RQ2在此形式下，LQR 成本的分析性质（光滑性、强制性、梯度支配性）是什么？
RQ3梯度流、自然梯度流和准牛顿流是否收敛到全局 LQR 最优，速率如何？
RQ4带有适当步长的离散化（梯度下降、自然梯度下降、Gauss-Newton-like 迭代）表现如何？
RQ5如何将该方法扩展到结构化（稀疏性约束）LQR 综合，以及在投影条件下的收敛保证？

主要发现

成本函数在其有效定义域上是光滑、强制性且梯度支配的。
这些流在李雅普诺夫意义下呈指数稳定并收敛到全局最优。
通过梯度下降、自然梯度下降和准牛顿迭代的离散时间更新，在合适步长下达到线性或二次收敛。
自然梯度下降在价值矩阵的半正定锥上产生单调非增序列。
通过投影梯度下降发展了结构化（稀疏模式）LQR 的形式化框架，给出到一阶驻点的亚线性收敛保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。