[Paper Review] Scalable Coordinated Exploration in Concurrent Reinforcement Learning
This paper proposes a scalable, coordinated exploration method for teams of reinforcement learning agents operating concurrently in a shared environment. By combining seed sampling with randomized value function learning, the approach enables efficient exploration with fewer agents and faster convergence, especially in high-dimensional settings using neural networks.
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on the seed sampling concept introduced in Dimakopoulou and Van Roy (2018) and on a randomized value function learning algorithm from Osband et al. (2016). We demonstrate that, for simple tabular contexts, the approach is competitive with those previously proposed in Dimakopoulou and Van Roy (2018) and with a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Motivation & Objective
- To address the challenge of efficient exploration in concurrent multi-agent reinforcement learning at scale.
- To reduce the number of agents required for effective exploration compared to prior methods.
- To enable fast learning in high-dimensional environments using neural network value function approximations.
Proposed method
- Adapts seed sampling from Dimakopoulou and Van Roy (2018) to coordinate exploration across multiple agents.
- Integrates randomized value function learning from Osband et al. (2016) to encourage exploration through stochastic value estimates.
- Uses a shared environment where agents act simultaneously, leveraging exploration diversity through randomized value function sampling.
- Employs neural networks to represent value functions, enabling scalability to high-dimensional state-action spaces.
- Coordinates exploration by aligning agent behavior through shared exploration signals derived from randomized value functions.
Experimental results
Research questions
- RQ1Can coordinated exploration be scaled effectively to large, high-dimensional environments with multiple agents?
- RQ2How does the proposed method compare to prior approaches in terms of sample efficiency and number of agents required?
- RQ3To what extent does the integration of seed sampling and randomized value functions improve learning speed and performance?
Key findings
- The method achieves competitive performance compared to prior approaches in simple tabular environments.
- In high-dimensional settings with neural network value function representations, the approach learns faster than alternative exploration schemes.
- The method requires significantly fewer agents to achieve effective exploration and learning compared to baseline methods.
- The integration of seed sampling and randomized value functions enables stable and scalable coordination in concurrent multi-agent RL.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.