QUICK REVIEW

[论文解读] Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Oron Anschel, Nir Baram|arXiv (Cornell University)|Nov 7, 2016

Reinforcement Learning in Robotics参考文献 22被引用 165

一句话总结

Averaged-DQN 通过对过去的 Q 值估计取平均来降低目标值方差，从而提高在 Atari 游戏上的稳定性和性能。

ABSTRACT

Instability and variability of Deep Reinforcement Learning (DRL) algorithms tend to adversely affect their performance. Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values. To understand the effect of the algorithm, we examine the source of value function estimation errors and provide an analytical comparison within a simplified model. We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.

研究动机与目标

解决深度强化学习在函数逼近下的不稳定性和高方差。
提出 Averaged-DQN 作为对 DQN 的简单扩展，即对先前学习的 Q 值进行平均。
分析目标近似误差方差如何影响学习动态。
在 Arcade Learning Environment (ALE) 基准上展示在稳定性和性能方面的经验提升。

提出的方法

通过使用最近 K 个学习网络的 Q 值平均来计算目标，扩展 DQN。
更新当前网络参数以最小化与平均目标的平方损失。
维持标准的经验回放缓冲区和 epsilon-greedy 探索。
将 Averaged-DQN 与 DQN 和 Double-DQN 在 ALE 游戏上进行比较，分析目标估计的方差减少。
就简化的 MDP 模型提供对 TAEs 的理论方差分析，并与 Ensemble-DQN 进行对比。
在 Breakout、Seaquest 和 Asterix 上的结果，展示稳定性和性能提升。

实验结果

研究问题

RQ1增加平均目标网络数量 K 如何影响值估计误差与过估计偏差？
RQ2平均目标是否带来更稳定的学习曲线和在 ALE 游戏上的改进策略性能？
RQ3就方差减少和过估计缓解而言，Averaged-DQN 与 Ensemble-DQN 相比如何？
RQ4在 DQN 因函数逼近而表现出不稳定的设置中，Averaged-DQN 是否能防止发散？

主要发现

增加 K 可以降低目标值误差方差和过估计，导致训练更稳定。
与 DQN 相比，Averaged-DQN 在多次运行中的平均分更高且波动性更低。
在 Breakout、Seaquest 和 Asterix 上，Averaged-DQN 结合更大 K 的情况下相较标准 DQN 提高了性能并减少波动。
从理论上讲，Averaged-DQN 在方差方面比 Ensemble-DQN 更高效，至少在 TAEs 上对 DQN 提升 K 倍。
某些游戏（如 Asterix）上 DQN 的发散可以被 Averaged-DQN 缓解。
实证结果表明 Averaged-DQN 在测试环境中可接近或超越 Double-DQN 的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。