QUICK REVIEW

[Paper Review] POMCPOW: An online algorithm for POMDPs with continuous state, action, and observation spaces.

Zachary N. Sunberg, Mykel J. Kochenderfer|arXiv (Cornell University)|Sep 18, 2017

Reinforcement Learning in Robotics26 references13 citations

TL;DR

This paper proposes POMCPOW, an online algorithm for POMDPs with continuous state, action, and observation spaces, combining double progressive widening (DPW) with weighted particle filtering to prevent belief collapse and enable effective policy learning. The method achieves superior performance on continuous problems where prior approaches fail due to particle degeneracy.

ABSTRACT

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. The main contribution of the paper is to propose a new algorithm, POMCPOW, that incorporates DPW and weighted particle filtering to overcome this deficiency and attack continuous problems. Simulation results show that these modifications allow the algorithm to be successful where previous approaches fail.

Motivation & Objective

To address the challenge of solving POMDPs with continuous state, action, and observation spaces, which remain difficult for existing online solvers.
To investigate whether double progressive widening (DPW) alone suffices for continuous POMDPs, identifying its limitations in belief representation.
To develop a new algorithm that overcomes belief collapse in particle-based belief representations by integrating weighted particle filtering with DPW.
To enable effective online planning in continuous POMDPs through robust and scalable belief representation and action selection.

Proposed method

The algorithm extends online planning to continuous POMDPs by applying double progressive widening (DPW) to dynamically expand the action and observation spaces in the search tree.
It incorporates weighted particle filtering to maintain diverse and representative belief particles, preventing the collapse to a single particle seen in standard particle filtering.
Belief states are represented using a set of weighted particles, with weights updated based on observation likelihoods to reflect posterior probabilities.
The search tree is expanded using a UCB1-based selection policy that balances exploration and exploitation in continuous action and observation spaces.
The algorithm performs online planning by simulating trajectories from the current belief state, using particle filtering to propagate beliefs through actions and observations.
A novel combination of DPW and weighted particle filtering ensures sustained belief diversity, enabling convergence to near-optimal policies.

Experimental results

Research questions

RQ1Does double progressive widening (DPW) alone suffice to solve continuous POMDPs without belief collapse?
RQ2Can weighted particle filtering effectively maintain belief diversity in continuous POMDPs when combined with DPW?
RQ3How does the proposed POMCPOW algorithm compare to existing methods in terms of policy quality and convergence on continuous problems?
RQ4What is the impact of belief representation quality on the performance of online POMDP solvers in continuous domains?

Key findings

Double progressive widening (DPW) alone leads to belief collapse, causing the algorithm to converge to suboptimal policies regardless of computation time.
The integration of weighted particle filtering with DPW successfully prevents belief collapse and enables stable, diverse belief representations.
POMCPOW achieves successful policy learning in continuous POMDPs where previous approaches fail due to particle degeneracy.
Simulation results demonstrate that POMCPOW outperforms existing methods on benchmark continuous POMDP problems, achieving higher expected return and better convergence.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.