Skip to main content
QUICK REVIEW

[Paper Review] EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach

Yannick Bukschat, Marcus Vetter|arXiv (Cornell University)|Nov 9, 2020
Advanced Neural Network Applications35 references68 citations
TL;DR

EfficientPose extends EfficientDet to predict multi-object 2D detections and full 6D poses in a single shot, achieving state-of-the-art RGB-based 6D pose accuracy on Linemod with high efficiency and scalability.

ABSTRACT

In this paper we introduce EfficientPose, a new approach for 6D object pose estimation. Our method is highly accurate, efficient and scalable over a wide range of computational resources. Moreover, it can detect the 2D bounding box of multiple objects and instances as well as estimate their full 6D poses in a single shot. This eliminates the significant increase in runtime when dealing with multiple objects other approaches suffer from. These approaches aim to first detect 2D targets, e.g. keypoints, and solve a Perspective-n-Point problem for their 6D pose for each object afterwards. We also propose a novel augmentation method for direct 6D pose estimation approaches to improve performance and generalization, called 6D augmentation. Our approach achieves a new state-of-the-art accuracy of 97.35% in terms of the ADD(-S) metric on the widely-used 6D pose estimation benchmark dataset Linemod using RGB input, while still running end-to-end at over 27 FPS. Through the inherent handling of multiple objects and instances and the fused single shot 2D object detection as well as 6D pose estimation, our approach runs even with multiple objects (eight) end-to-end at over 26 FPS, making it highly attractive to many real world scenarios. Code will be made publicly available at https://github.com/ybkscht/EfficientPose.

Motivation & Objective

  • Extend EfficientDet to predict both 2D detections and full 6D poses (rotation and translation) in a single shot.
  • Introduce lightweight, shared subnetworks for rotation and translation to maintain efficiency across object counts.
  • Propose 6D augmentation to improve generalization when training data is limited.
  • Develop a robust transformation loss that optimizes directly on ADD(-S) metrics for asymmetric and symmetric objects.

Proposed method

  • Extend EfficientDet with two additional subnetworks for rotation (R) and translation (t) prediction.
  • Use axis-angle rotation representation with an iterative refinement module to predict final rotation.
  • Translate objects by predicting 2D center points and depth, then recover 3D translation using camera intrinsics.
  • Apply a transformation loss based on ADD(-S) to directly optimize pose accuracy for asymmetric and symmetric objects.
  • Introduce 6D augmentation to rotate and scale images along with corresponding 6D pose adjustments, improving generalization on small datasets.
  • Inherit EfficientDet’s phi-scalable backbone to enable end-to-end pose estimation across a range of compute budgets.

Experimental results

Research questions

  • RQ1Can direct end-to-end 6D pose estimation achieve state-of-the-art accuracy on RGB input without post-processing like PnP or RANSAC?
  • RQ2Does integrating 6D pose estimation into EfficientDet enable scalable, single-shot multi-object pose estimation across multiple instances?
  • RQ3How does 6D augmentation affect generalization on small datasets for RGB-based 6D pose estimation?
  • RQ4What is the impact of network scaling (phi) on accuracy and throughput for multi-object 6D pose estimation?

Key findings

  • Achieves 97.35% ADD(-S) on Linemod for RGB input without post-processing refinements.
  • Runs end-to-end at over 27 FPS, and over 26 FPS with up to eight objects per image.
  • Outperforms state-of-the-art RGB-only methods on Linemod including methods with refinement.
  • Demonstrates effective multi-object and multi-instance detection within a single shot, due to shared feature maps and anchor-based predictions.
  • Shows significant gains from the proposed 6D augmentation in improving pose estimation performance on small datasets.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.