QUICK REVIEW

[论文解读] Optimal Control Via Neural Networks: A Convex Approach

Yize Chen, Yuanyuan Shi|arXiv (Cornell University)|May 30, 2018

Reinforcement Learning in Robotics参考文献 33被引用 27

一句话总结

本文提出输入凸循环神经网络（ICRNNs），以在复杂动态系统中实现最优控制的凸优化。通过确保从输入到输出的凸性，该方法可在保持高建模精度的同时实现可处理的、全局最优的模型预测控制（MPC），在暖通空调（HVAC）控制中实现23.25%的能效降低，在MuJoCo运动控制任务中性能提升超过10%，且训练时间仅为最先进基于模型的强化学习方法的1/5。

ABSTRACT

Control of complex systems involves both system identification and controller design. Deep neural networks have proven to be successful in many identification tasks, however, from model-based control perspective, these networks are difficult to work with because they are typically nonlinear and nonconvex. Therefore many systems are still identified and controlled based on simple linear models despite their poor representation capability. In this paper we bridge the gap between model accuracy and control tractability faced by neural networks, by explicitly constructing networks that are convex with respect to their inputs. We show that these input convex networks can be trained to obtain accurate models of complex physical systems. In particular, we design input convex recurrent neural networks to capture temporal behavior of dynamical systems. Then optimal controllers can be achieved via solving a convex model predictive control problem. Experiment results demonstrate the good potential of the proposed input convex neural network based approach in a variety of control applications. In particular we show that in the MuJoCo locomotion tasks, we could achieve over 10% higher performance using 5* less time compared with state-of-the-art model-based reinforcement learning method; and in the building HVAC control example, our method achieved up to 20% energy reduction compared with classic linear models.

研究动机与目标

解决数据驱动复杂系统控制中模型精度与计算可处理性之间的权衡。
克服标准神经网络非凸性带来的问题，从而在基于模型的控制中实现可靠优化。
开发一种对输入凸的神经网络架构，以通过凸MPC实现全局最优控制。
通过循环结构将凸神经网络的应用扩展到时间动态建模，以实现对动态系统的建模。
在真实控制任务（如建筑暖通空调管理与机器人运动控制）中，性能优于线性模型和传统RNN。

提出的方法

提出对输入凸的输入凸循环神经网络（ICRNNs），以支持控制的凸优化。
使用随机梯度下降训练ICRNNs，以最小化预测输出与实际系统输出之间的均方误差。
利用训练好的ICRNNs在凸模型预测控制（MPC）框架中表示系统动力学，确保全局最优性。
将最优控制问题表述为受系统动力学和物理约束限制的凸优化问题。
采用基于梯度的优化方法在有限时域内求解MPC问题，利用ICRNN的凸性实现可靠收敛。
将输入凸网络框架扩展至循环架构，以建模动态系统中的时间依赖性。

实验结果

研究问题

RQ1能否设计一种深度神经网络架构，使其对输入凸，同时在复杂动态系统中保持高建模精度？
RQ2凸神经网络能否实现实时控制应用中全局最优且计算可处理的模型预测控制？
RQ3ICRNN控制在控制精度和能效方面与线性模型和传统RNN相比表现如何？
RQ4ICRNN能否有效建模非线性建筑暖通空调动力学，并在受限环境中生成稳定、最优的控制动作？
RQ5网络架构的凸性在多大程度上提升了控制任务中优化的可靠性与收敛性？

主要发现

ICRNN在建模建筑暖通空调动力学时，测试均方根误差（RMSE）为0.054，与传统RNN（0.051）相当，显著优于线性RC模型（0.240）。
基于ICRNN的MPC在温度约束下使建筑能耗降低23.25%，优于传统RNN（11.73%节能量）和线性RC模型（4.07%节能量）。
ICRNN生成稳定且平滑的控制动作，而传统RNN则产生高度波动且不稳定的控制信号。
在MuJoCo运动控制任务中，ICRNN方法性能优于最先进基于模型的强化学习方法，提升超过10%，且训练时间仅为后者的1/5。
理论分析表明，ICRNN能够表示所有凸函数，且在凸函数表示方面比分段线性逼近方法效率高出指数级。
ICRNN的凸性确保了所得到的MPC问题为凸问题，从而在实时控制应用中实现全局最优性和可靠收敛。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。