QUICK REVIEW

[論文レビュー] PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

Sida Peng, Yuan Liu|arXiv (Cornell University)|Dec 31, 2018

Robotics and Sensor-Based Localization参考文献 41被引用数 44

ひとこと要約

PVNetは、各ピクセルから2Dキーポイントへの単位ベクトルを回帰するピクセル単位の投票フレームワークを導入し、RANSAC投票と不確実性を考慮したPnPを用いて、単一のRGB画像から6DoFポーズを推定し、遮蔽および切り詰めに対して頑健である。

ABSTRACT

This paper addresses the challenge of 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Many recent works have shown that a two-stage approach, which first detects keypoints and then solves a Perspective-n-Point (PnP) problem for pose estimation, achieves remarkable performance. However, most of these methods only localize a set of sparse keypoints by regressing their image coordinates or heatmaps, which are sensitive to occlusion and truncation. Instead, we introduce a Pixel-wise Voting Network (PVNet) to regress pixel-wise unit vectors pointing to the keypoints and use these vectors to vote for keypoint locations using RANSAC. This creates a flexible representation for localizing occluded or truncated keypoints. Another important feature of this representation is that it provides uncertainties of keypoint locations that can be further leveraged by the PnP solver. Experiments show that the proposed approach outperforms the state of the art on the LINEMOD, Occlusion LINEMOD and YCB-Video datasets by a large margin, while being efficient for real-time pose estimation. We further create a Truncation LINEMOD dataset to validate the robustness of our approach against truncation. The code will be avaliable at https://zju-3dv.github.io/pvnet/.

研究の動機と目的

1. severe occlusion or truncationにおける単一RGB画像からの6DoFポーズ推定に対応する。
2. スパースなキーポイント回帰を密なピクセル単位のベクトル場予測へ置換して頑健性を向上。
3. RANSACベースの投票を活用してキーポイントを局在化し、ポーズ推定のための局所化不確実性を定量化する。

提案手法

各ピクセルから各キーポイントへ向かうピクセル単位の意味的ラベルと単位ベクトルを予測する。
ピクセル単位のベクトルの交差をRANSAC様に用いてキーポイントの位置候補を生成する。
投票から各キーポイントの平均および共分散を推定して空間確率分布を形成する。
キーポイントの平均と共分散を用いてマハラノビス距離を最小化することにより6Dポーズを解く（不確実性を考慮したPnP）。
SyntheticデータにCut and Paste拡張を含めてPVNetを訓練し、dilated畳み込みを用いたResNet-18ベースの全畳み込みバックボーンを使用する。
表面キーポイント（K=8）と物体中心をFPSベースで選択し、不確実性を考慮したPnPをポーズ推定に適用する。

実験結果

リサーチクエスチョン

RQ1密なピクセル単位のベクトル場表現は、遮蔽/切詰まり下での2Dキーポイント局在化を回帰/ヒートマップアプローチと比較して改善できるか？
RQ2PnPソルバーにキーポイント局在化の不確実性を組み込むと6DoFポーズ精度にどのように影響するか？
RQ3キーポイント選択戦略とキーポイント数がポーズ推定性能に与える影響はどの程度か？
RQ4PVNetは標準ベンチマーク（LINEMOD、Occlusion LINEMOD、YCB-Video）および切詰みシナリオでどの程度効果を示すか？

主な発見

PVNetはLINEMOD、Occlusion LINEMOD、YCB-Videoデータセットで最先端の性能を達成。
不確実性駆動PnPは各キーポイントの共分散を活用することでEPnPよりポーズ精度を改善。
Farthest Point Samplingで選択された8つの表面キーポイント（および中心）を用いることで、良い精度と効率のバランスを達成。
ベクトル場を用いたピクセル単位の投票は遮蔽および切詰みに対して頑健で、回帰ベースのキーポイント手法を上回る。
本手法はGTX 1080 Ti上で約25 fps程度で動作し、リアルタイム応用に適している。
本論文は切詰み耐性を検証するTruncation LINEMODデータセットを提案する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。