QUICK REVIEW

[论文解读] Model-Based Deep Reinforcement Learning for High-Dimensional Problems, a Survey.

Aske Plaat, Walter A. Kosters|arXiv (Cornell University)|Aug 11, 2020

Reinforcement Learning in Robotics参考文献 133被引用 12

一句话总结

本综述为高维问题中的基于模型的深度强化学习（MBRL）提出了一套全面的分类体系，将方法分为三类：基于给定转移的规划、基于学习动态模型的规划以及端到端学习。该综述识别出样本效率与预测能力方面的关键挑战，回顾了如潜在模型等近期进展，并指出未来方向，包括不确定性建模以及通过潜在空间实现的迁移学习。

ABSTRACT

Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems have been solved in tasks such as game playing and robotics. Unfortunately, the sample complexity of most deep reinforcement learning methods is high, precluding their use in some important applications. Model-based reinforcement learning creates an explicit model of the environment dynamics to reduce the need for environment samples. Current deep learning methods use high-capacity networks to solve high-dimensional problems. Unfortunately, high-capacity models typically require many samples, negating the potential benefit of lower sample complexity in model-based methods. A challenge for deep model-based methods is therefore to achieve high predictive power while maintaining low sample complexity. In recent years, many model-based methods have been introduced to address this challenge. In this paper, we survey the contemporary model-based landscape. First we discuss definitions and relations to other fields. We propose a taxonomy based on three approaches: using explicit planning on given transitions, using explicit planning on learned transitions, and end-to-end learning of both planning and transitions. We use these approaches to organize a comprehensive overview of important recent developments such as latent models. We describe methods and benchmarks, and we suggest directions for future work for each of the approaches. Among promising research directions are curriculum learning, uncertainty modeling, and use of latent models for transfer learning.

研究动机与目标

解决高维控制任务中深度强化学习的高样本复杂度问题。
克服基于模型的深度强化学习中高容量模型与低样本复杂度之间的权衡。
提供一个结构化的分类体系，以组织MBRL领域近期的进展，特别是潜在动态建模方面的进展。
识别并分析关键的方法论路径：基于给定转移的规划、基于学习动态的规划以及端到端学习。
提出未来研究方向，包括课程学习、不确定性建模以及利用潜在模型的迁移学习。

提出的方法

提出MBRL方法的三类分类体系：(1) 基于给定转移的规划，(2) 基于学习动态的规划，(3) 动态与规划组件的端到端联合学习。
根据其是否使用显式动态模型对近期方法进行分类，尤其关注在高维状态与动作空间中使用深度神经网络的方法。
强调潜在空间表征在降低模型复杂度和提升样本效率方面的作用。
回顾近期MBRL文献中使用的基准环境与评估协议，以评估性能与泛化能力。
分析模型预测中不确定性估计的技术，以提升鲁棒性与样本效率。
突出规划算法（如蒙特卡洛树搜索、MDP求解器）与学习动态模型的集成。

实验结果

研究问题

RQ1基于模型的深度强化学习如何在高维环境中实现高预测精度的同时保持低样本复杂度？
RQ2在MBRL中，基于给定转移的规划、基于学习动态的规划与端到端训练之间存在哪些关键区别与权衡？
RQ3潜在动态模型在多大程度上提升了MBRL的样本效率与泛化能力？
RQ4在学习动态模型中引入不确定性建模如何增强MBRL智能体的鲁棒性？
RQ5课程学习与迁移学习在加速MBRL训练方面可能发挥何种作用？

主要发现

潜在动态模型通过降低状态空间的维度，同时保持预测能力，显著提升了样本效率。
联合优化动态与规划组件的端到端学习方法在样本效率与最终性能方面通常优于模块化流水线。
在模型预测中引入不确定性估计的方法在部署过程中表现出更强的鲁棒性，并减少了样本需求。
通过逐步增加任务复杂度的课程学习策略，可实现更快的收敛与更好的泛化能力。
利用共享潜在空间的迁移学习可实现对新任务的更快适应，尤其在预训练于多样化环境后效果更显著。
基准测试表明，当前最先进MBRL方法在长时序任务与高维连续控制任务中仍表现不佳，表明仍有改进空间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。