QUICK REVIEW

[論文レビュー] PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Xiang Yu, Tanner Schmidt|arXiv (Cornell University)|Nov 1, 2017

Human Pose and Action Recognition参考文献 34被引用数 130

ひとこと要約

PoseCNNは、3D translationを2D中心位置決定と中心深度でデカップリングし、3D回転をクォータニオン回帰で推定するCNNであり、対称性のためShapeMatch-Lossを用い、YCB-VideoとOccludedLINEMODデータセットで評価される。

ABSTRACT

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.

研究の動機と目的

深度データへの過度な依存を避けつつ、乱雑さや遮蔽下での堅牢な6D姿勢推定を動機づける。
3D平行移動推定と回転推定を別々に扱うエンドツーエンドのCNNを開発する。
ShapeMatch-Lossという特殊損失関数を用いて対称物体を扱う。
21個の物体に対する6D姿勢アノテーション付きの大規模RGB-Dビデオデータセット（YCB-Video）を提供する。

提案手法

タスク間で特徴を共有する2段階CNNバックボーン。
各ピクセルのセマンティックラベリングで物体クラスを識別し、中心投票を可能にする。
各ピクセルの単位中心方向を回帰して2Dの物体中心を局所化し、Hough投票層で2D中心を見つける。
2D中心位置と推定中心距離（深さ）を組み合わせて3D translationを推定し、Tを復元する。
物体の境界ボックス features からクラスごとのクォータニオンへの3D回転回帰。非対称物体にはPoseLoss、対称物体にはShapeMatch-Lossで学習する。
深度データを用いたICPリファインメントで、利用可能な場合に姿勢を精査する。

実験結果

リサーチクエスチョン

RQ1CNNがセマンティックラベリング、2D中心投票、そして3D姿勢回帰を共同で行い、乱雑なシーンで正確な6D姿勢推定を達成できるか？
RQ2対称性を、対称姿勢を列挙せずに回転回帰の中で効果的に扱うにはどうすればよいか？
RQ3中心投票ベースの平行移動推定は、直接的な3D座標回帰と比較して遮蔽に対するロバスト性を向上させるか？
RQ4OccludedLINEMODやYCB-Videoのような難しいデータセットで、カラーのみの入力とRGB-D入力に対してPoseCNNはどのように性能を発揮するか？

主な発見

PoseCNNはカラー画像のみから強力な6D姿勢推定を達成し、YCB-Videoで3D座標回帰のベースラインを上回る。
ICPリファインメントによる深度の取り込みは精度を大幅に向上させ、しばしばRGB-Dのベースラインを上回る。
ShapeMatch-Lossは対称物体を効果的に扱い、OccludedLINEMODのEggboxとGlueの姿勢推定を改善する。
OccludedLINEMODでは、ICPを用いたPoseCNNがRGB-D入力を用いる最先端手法をいくつかの物体で上回る。
YCB-Videoデータセット（21物体、133,827フレーム）は、遮蔽と対称性に対する堅牢な学習と評価を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。