QUICK REVIEW

[论文解读] Transfer Learning in Deep Reinforcement Learning: A Survey

Zhuangdi Zhu, Kaixiang Lin|arXiv (Cornell University)|Sep 16, 2020

Reinforcement Learning in Robotics参考文献 178被引用 150

一句话总结

对深度强化学习中迁移学习应用的全面综述，详述知识迁移类型、分类、评估指标和未来方向。

ABSTRACT

Reinforcement learning is a learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks. Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning, by transferring knowledge from external expertise to facilitate the efficiency and effectiveness of the learning process. In this survey, we systematically investigate the recent progress of transfer learning approaches in the context of deep reinforcement learning. Specifically, we provide a framework for categorizing the state-of-the-art transfer learning approaches, under which we analyze their goals, methodologies, compatible reinforcement learning backbones, and practical applications. We also draw connections between transfer learning and other relevant topics from the reinforcement learning perspective and explore their potential challenges that await future research progress.

研究动机与目标

在强化学习和 DRL 的语境中定义迁移学习。
系统性地按在 DRL 主干中被迁移的知识形式及其使用方式对 TL 方法进行分类。
分析 TL 方法在 DRL 的目标、方法论与应用。
讨论 TL 在 DRL 中的评估指标和未来研究方向。

提出的方法

提出一个框架，基于转移知识的形式和迁移过程，对 DRL 中的 TL 方法进行分类。
按转移知识的形式（如奖励塑形、从示范学习、教师策略、表征）整理 TL 方法。
分析与 RL 主干的兼容性，以及源域和目标域之间的差异。
总结 TL 在 DRL 中的评估指标，并讨论与知识质量和数量相关的新指标。

实验结果

研究问题

RQ1在 DRL 中可以转移哪些形式的知识来促进学习？
RQ2不同的 TL 方法如何与各种 DRL 主干和任务差异对齐？
RQ3哪些指标最能评估 TL 的有效性以及在 DRL 中转移知识的质量？
RQ4TL 在 DRL 中的未来方向和未解决的挑战有哪些？

主要发现

奖励塑形、示范学习和策略迁移是 DRL 中的核心 TL 方法，对主干具有不同的兼容性。
PBRS、PBA、DPB 和 DPBA 为 TL 在 DRL 中提供了一系列基于潜在回报塑形的方法。
从示范和教师策略中学习使策略无关的知识转移和策略蒸馏形式在 DRL 任务之间成为可能。
TL 评估既考虑掌握度（最终性能）也考虑泛化（速度与鲁棒性），以及诸如必要知识量和质量等面向知识的指标。
该综述指出未来方向，如在多样化知识形式上的推理，以及高效、原理性的知识使用用于 TL 在 DRL 中。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。