[Paper Review] Control of Memory, Active Perception, and Action in Minecraft
The paper introduces memory-based deep RL architectures evaluated on Minecraft tasks that test partial observability, delayed rewards, and active perception, and shows improved generalization to unseen maps over standard DRL baselines.
In this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world). We then use these tasks to systematically compare and contrast existing deep reinforcement learning (DRL) architectures with our new memory-based DRL architectures. These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability (due to first-person visual observations), delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks. While these tasks are conceptually simple to describe, by virtue of having all of these challenges simultaneously they are difficult for current DRL architectures. Additionally, we evaluate the generalization performance of the architectures on environments not used during training. The experimental results show that our new architectures generalize to unseen environments better than existing DRL architectures.
Motivation & Objective
- Motivate reinforcement learning in a controllable 3D world (Minecraft) to stress partial observability, delayed rewards, high-dimensional perception, and active perception.
- Systematically compare existing DRL architectures with new memory-based DRL architectures on designed cognitive tasks.
- Evaluate generalization performance of architectures to unseen or larger map topologies.
- Demonstrate that memory-based architectures generalize better to unseen maps by leveraging context-dependent memory retrieval.
Proposed method
- Encode observations with a CNN to fixed-length feature vectors.
- Store recent observations into an external memory as key/value blocks.
- Retrieve memory with soft attention conditioned on a context vector.
- Construct context vectors with three variants: MQN (feedforward), RMQN (LSTM-based), and FRMQN (LSTM with memory feedback).
- Estimate action-values using an MLP that combines the context and retrieved memory.
Experimental results
Research questions
- RQ1Can memory-augmented DRL architectures handle partial observability, active perception, and memory-based reasoning better than traditional DQN/DRQN on Minecraft tasks?
- RQ2Do context-dependent memory retrieval and memory feedback improve generalization to unseen or larger maps?
- RQ3How do the proposed architectures perform across tasks requiring memory of indicators, patterns, and sequential goals?
- RQ4Do memory-based models exhibit better extrapolation to larger or different map topologies than standard baselines?
Key findings
- Memory-based architectures (MQN, RMQN, FRMQN) generally outperform DQN and DRQN on cognitive Minecraft tasks.
- FRMQN achieves strongest generalization to unseen maps across tasks, particularly in pattern matching and sequential goals with indicators.
- Memory retrieval is used selectively and contextually, e.g., FRMQN retrieves indicator information only when relevant to decision making.
- RMQN and FRMQN show better generalization than DRQN on unseen maps, while DRQN struggles with long-term dependencies under partial observability.
- Across tasks, the gap between memory-augmented models and baselines widens as partial observability increases (e.g., greater distance between indicator and goal).
- Qualitative analyses show memory attention focusing on relevant observations at decision points, supporting learned strategies for active perception.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.