QUICK REVIEW

[论文解读] Behaviour Suite for Reinforcement Learning

Ian Osband, Yotam Doron|arXiv (Cornell University)|Aug 9, 2019

Reinforcement Learning in Robotics参考文献 48被引用 36

一句话总结

bsuite 是一组有针对性、可扩展的 RL 实验的集合，以及一个开源工具包，用于评估和分析代理行为，以理解核心 RL 能力。它提供原理性诊断和跨代码库的可重复分析。

ABSTRACT

This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite. This library facilitates reproducible and accessible research on the core issues in RL, and ultimately the design of superior learning algorithms. Our code is Python, and easy to use within existing projects. We include examples with OpenAI Baselines, Dopamine as well as new reference implementations. Going forward, we hope to incorporate more excellent experiments from the research community, and commit to a periodic review of bsuite from a committee of prominent researchers.

研究动机与目标

提供清晰、信息丰富、可扩展的实验，诊断关键 RL 能力。
使 RL 代理和代码库之间的评估与比较具有可重复性。
分离并研究基本的 RL 问题，如探索、记忆和奖励分配。

提出的方法

定义一套诊断性 RL 实验，具有固定环境、交互模式和分析流程。
在每个任务上用 [0,1] 的评分量表对代理性能进行评分，以实现快速比较。
提供开源实现、参考基线，以及自动化分析笔记本，促进可重复研究。
说明 bsuite 实验如何设计为有针对性、简单、具有挑战性、可扩展且快速。

实验结果

研究问题

RQ1通过有针对性的实验可以隔离并衡量哪些核心 RL 能力？
RQ2不同的 RL 算法在探测记忆与探索的诊断任务上的表现如何？
RQ3共同的基准库能否实现跨不同 RL 代码库的可重复评估？
RQ4当问题规模增加时，算法在诊断任务上的缩放性质如何？

主要发现

记忆长度实验表明，循环策略在多步记忆任务上优于前馈策略，具有明确的增长趋势。
DQN 和 Bootstrapped DQN 在记忆长度超过一步时表现困难，而 A2C 显示在一个截止点之前的随机性增加前有强劲表现。
Deep Sea 探索凸显深度探索的需求，Bootstrapped DQN 提供对更大问题规模的更优可扩展性。
.bsuite 通过雷达图提供快速、可解释的摘要，并在多项实验中使用统一评分机制。
开源工具使与现有 RL 代码库的易集成并促进可重复分析。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。