QUICK REVIEW

[论文解读] Multi-focus Attention Network for Efficient Deep Reinforcement Learning

Jin‐Young Choi, Beom‐Jin Lee|arXiv (Cornell University)|Dec 13, 2017

Reinforcement Learning in Robotics参考文献 16被引用 29

一句话总结

本文提出多焦点注意力网络（MANet），一种深度强化学习模型，通过将视觉输入划分为部分状态并应用并行注意力机制来聚焦于与任务相关的实体，从而提高样本效率。MANet 在经验样本数量显著少于 DQN 和单注意力模型的情况下实现了最先进性能，并将多智能体协作学习速度提升了 20%。

ABSTRACT

Deep reinforcement learning (DRL) has shown incredible performance in learning various tasks to the human level. However, unlike human perception, current DRL models connect the entire low-level sensory input to the state-action values rather than exploiting the relationship between and among entities that constitute the sensory input. Because of this difference, DRL needs vast amount of experience samples to learn. In this paper, we propose a Multi-focus Attention Network (MANet) which mimics human ability to spatially abstract the low-level sensory input into multiple entities and attend to them simultaneously. The proposed method first divides the low-level input into several segments which we refer to as partial states. After this segmentation, parallel attention layers attend to the partial states relevant to solving the task. Our model estimates state-action values using these attended partial states. In our experiments, MANet attains highest scores with significantly less experience samples. Additionally, the model shows higher performance compared to the Deep Q-network and the single attention model as benchmarks. Furthermore, we extend our model to attentive communication model for performing multi-agent cooperative tasks. In multi-agent cooperative task experiments, our model shows 20% faster learning than existing state-of-the-art model.

研究动机与目标

解决深度强化学习（DRL）在稀疏奖励和大量经验样本下学习效率低下的问题。
通过聚焦于感官输入中多个相关实体而非均匀处理原始像素，模拟人类的空间抽象能力。
提升单智能体与多智能体强化学习任务中的样本效率和学习速度。
开发一种可扩展的注意力机制，动态关注多个部分状态，以进行状态-动作值估计。

提出的方法

该模型将低层次感官输入（如图像）划分为多个非重叠的部分状态，以模拟人类对不同实体的注意力。
对每个部分状态应用并行注意力层，以提取与任务相关的特征，实现对显著区域的聚焦处理。
将多个部分状态的注意力特征进行融合，以估计状态-动作值，提升表征质量。
该架构集成了双流注意力机制，支持对多个空间上分离的输入片段进行并行处理。
该方法在多智能体设置中扩展为通信机制，使智能体能够共享注意力特征以完成协作任务。
模型通过深度 Q 学习进行端到端训练，使用经验回放和目标网络，与 DQN 类似，但增强了特征提取能力。

实验结果

研究问题

RQ1将视觉输入划分为部分状态并应用多焦点注意力，是否能提升深度强化学习中的样本效率？
RQ2在学习速度和最终性能方面，多焦点注意力相较于单注意力或原始像素输入表现如何？
RQ3所提出的注意力机制能否有效扩展至具有智能体间通信的多智能体协作任务？
RQ4该模型在达到人类水平性能时，能将所需经验样本数量减少多少？
RQ5注意力机制是否能提升复杂视觉环境中模型的泛化能力和鲁棒性？

主要发现

MANet 在基线模型（DQN 和单注意力网络）中取得最高得分，且所需经验样本显著更少。
通过聚焦于相关视觉实体，模型降低了样本复杂度，从而在单智能体控制任务中实现更快收敛。
在多智能体协作任务中，MANet 比最先进模型快 20% 学习，表明其样本效率更高。
注意力机制通过使智能体能够选择性关注与任务相关的视觉组件，而非均匀处理整个输入，从而提升性能。
扩展为注意力通信模型后，增强了智能体间的协调能力，使其在协作环境中表现更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。