QUICK REVIEW

[Paper Review] PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking

Xinke Deng, Arsalan Mousavian|arXiv (Cornell University)|May 22, 2019

Advanced Neural Network Applications40 references52 citations

TL;DR

PoseRBPF factorizes 6D object pose tracking into translation and rotation distributions using a Rao-Blackwellized particle filter, with rotation handled via a discretized codebook learned by an auto-encoder; it tracks full pose posteriors and achieves state-of-the-art results on YCB-Video and T-LESS, including robust handling of object symmetries.

ABSTRACT

Tracking 6D poses of objects from videos provides rich information to a robot in performing different tasks such as manipulation and navigation. In this work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the 3D translation of an object are decoupled. This factorization allows our approach, called PoseRBPF, to efficiently estimate the 3D translation of an object along with the full distribution over the 3D rotation. This is achieved by discretizing the rotation space in a fine-grained manner, and training an auto-encoder network to construct a codebook of feature embeddings for the discretized rotations. As a result, PoseRBPF can track objects with arbitrary symmetries while still maintaining adequate posterior distributions. Our approach achieves state-of-the-art results on two 6D pose estimation benchmarks. A video showing the experiments can be found at https://youtu.be/lE5gjzRKWuA

Motivation & Objective

Motivate and address 6D object pose tracking in video with temporal uncertainty.
Develop a probabilistic framework that represents full posterior distributions over 3D rotation and 3D translation.
Enable robust tracking for objects with arbitrary symmetries without manual symmetry labeling.
Leverage learned representations to efficiently evaluate multiple orientation hypotheses per frame.

Proposed method

Factorize the 6D pose posterior into translation P(T_k|Z_1:k) and rotation P(R_k|T_k, Z_1:k).
Use a Rao-Blackwellized particle filter to sample translations and maintain discrete rotation distributions per particle (rotations discretized at 5-degree resolution into 72x37x72 bins).
Train an auto-encoder to build a codebook of feature embeddings for discretized rotations by rendering object views from a canonical translation, enabling fast rotation likelihoods via cosine similarity with RoIs.
Compute observation likelihoods by transforming real RGB images into a synthetic-domain embedding via the auto-encoder and matching RoI embeddings against the codebook.
Propagate translations with a constant-velocity motion prior and rotations with a 3D Gaussian convolution over the previous rotation distribution.
Extend to RGB-D by incorporating depth-based depth-discrepancy scores and visibility masking to refine likelihoods for each particle.
Initialize from a 2D detector, then iteratively update particle translations and rotation distributions per frame, with resampling and the possibility of tracking-failure detection via codebook match quality.

Experimental results

Research questions

RQ1Can 6D pose tracking be represented as a posterior over translation and rotation that can be efficiently sampled in real time?
RQ2How can learned rotation representations be integrated into a probabilistic filter to handle symmetries without explicit symmetry labeling?
RQ3Does decoupling translation and rotation allow accurate tracking of full pose posteriors and robust performance on symmetric/non-textured objects?
RQ4Can RGB-D data improve pose tracking when combined with the rotated codebook matching approach?
RQ5What is the impact of particle count on real-time performance and accuracy across challenging datasets?

Key findings

PoseRBPF represents full posteriors over 6D poses by decoupling translation and rotation and using a discretized rotation codebook per particle.
The rotation distribution per particle is maintained over 191,808 bins (72x37x72) at 5-degree rotation resolution, enabling multi-modal orientation tracking.
An auto-encoder-derived codebook enables efficient rotation likelihoods via cosine similarity between RoI embeddings and discretized rotation embeddings.
RGB-D extension using depth discrepancy and visibility improves pose accuracy over RGB alone, achieving state-of-the-art results on YCB-Video and T-LESS datasets.
The method runs at approximately 20 frames per second in its RGB version, and up to 20fps on RGB-D configurations with GPU-accelerated codebook matching; larger particle counts improve accuracy, and a hybrid PoseRBPF++ variant around PoseCNN predictions yields further gains.
PoseRBPF effectively handles object symmetries without explicit symmetry labeling and provides interpretable rotation posteriors, as demonstrated on challenging symmetric and non-textured objects.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.