QUICK REVIEW

[Paper Review] Relational Deep Reinforcement Learning

Vinícius Zambaldi, David Raposo|arXiv (Cornell University)|Jun 5, 2018

Reinforcement Learning in Robotics20 references159 citations

TL;DR

The paper introduces relational inductive biases via self-attention for deep reinforcement learning, enabling non-local relation reasoning among entities to improve sample efficiency, generalization, and performance on Box-World and StarCraft II mini-games.

ABSTRACT

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

Motivation & Objective

Motivate improving deep RL by incorporating relational representations to enhance generalization and sample efficiency.
Propose an architectural inductive bias that enables non-local, iterative relational reasoning among scene entities.
Demonstrate that relational reasoning yields interpretable, transferable representations.
Show state-of-the-art performance on StarCraft II mini-games and strong performance on a relationally challenging Box-World task.

Proposed method

Represent states, actions, and policies in a relational language to guide learning.
Use non-local, shared-function attention blocks (multi-head dot-product attention) to compute pairwise and higher-order interactions among entities.
Extract entities from pixel inputs by appending coordinates to CNN features and treating spatial cells as entities for attention processing.
Stack attention blocks with residual connections and aggregate via max-pooling before policy and value heads.
Apply an actor-critic setup with a distributed architecture (100 actors, 1 learner) for Box-World; adapt architecture for StarCraft II with ConvLSTM to handle temporal dependencies.
Provide baseline comparisons with a non-relational control network (residual conv blocks) to isolate relational benefits.

Experimental results

Research questions

RQ1Can relational representations learned via self-attention improve generalization to unseen relational configurations in RL tasks?
RQ2Do iterative, non-local relational computations enable higher-order relation reasoning beyond local convolutions?
RQ3How do relational inductive biases affect sample efficiency and performance in complex environments like StarCraft II mini-games?
RQ4To what extent are learned relational representations interpretable and transferable across tasks?

Key findings

Relational modules enabled near-optimal performance on Box-World variants, outperforming convolutional baselines especially as distractor complexity increased.
In Box-World, agents with relational reasoning generalized to longer solution paths and unseen key-lock configurations with high success rates (e.g., >88% in zero-shot transfer for longer paths).
On StarCraft II mini-games, the relational agent achieved state-of-the-art scores on six mini-games and surpassed human grandmasters on four, outperforming the control agent.
Attention visualizations showed interpretable relational semantics, such as keys attending to unlockable locks and agents attending to keys and gems.
Relational agents demonstrated zero-shot transfer capabilities in longer sequences and novel key-lock combinations, indicating stronger abstract relational understanding.
Relational biases contributed to improved generalization in some SC2 settings, though results showed variability and dependence on model size.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.