QUICK REVIEW

[논문 리뷰] PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Xiang Yu, Tanner Schmidt|arXiv (Cornell University)|2017. 11. 01.

Human Pose and Action Recognition참고 문헌 34인용 수 130

한 줄 요약

PoseCNN은 3D 평행 이동을 2D 중심 위치화와 중심 깊이를 통해 분리하고 3D 회전을 쿼터니언 회귀로 추정하며, 대칭에 대해 ShapeMatch-Loss를 사용하는 6D 물체 자세를 추정하는 CNN으로, YCB-Video 및 OccludedLINEMOD 데이터셋에서 평가됩니다.

ABSTRACT

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.

연구 동기 및 목표

깊이 데이터에 대한 과도한 의존 없이 clutter와 occlusion에서 견고한 6D 자세 추정을 목표로 한다.
회전과 평행 이동 추정을 별도로 다루는 엔드-투-엔드 CNN을 개발한다.
특수한 손실 함수(ShapeMatch-Loss)를 통해 대칭 물체를 처리한다.
6D 자세 주석이 있는 21개 물체의 대규모 RGB-D 비디오 데이터셋(YCB-Video)을 제공한다.

제안 방법

공유된 특징을 갖는 2단계 CNN 백본
픽셀당 의미 레이블링을 통한 객체 클래스 식별 및 중심 투표 활성화
각 픽셀에 대해 단위 중심 방향을 회귀하고 허프 투표 계층으로 2D 중심을 찾는 2D 객체 중심 위치화
예측된 중심 거리(depth)와 2D 중심 위치를 결합해 3D 평행 이동(T)을 회복
객체 바운딩 박스 특징에서 클래스별 쿼터니언으로 3D 회전 회귀; 비대칭 객체에 대해 PoseLoss로, 대칭 객체에 대해 ShapeMatch-Loss로 학습
가능하면 깊이 데이터를 사용한 ICP 보정으로 자세를 정제

실험 결과

연구 질문

RQ1CNN이 의미 레이블링, 2D 중심 투표, 및 3D 포즈 회귀를 공동으로 수행하여 혼잡한 장면에서 정확한 6D 포즈 추정을 달성할 수 있는가?
RQ2회전을 회귀할 때 대칭을 명시적으로 열거하지 않고도 효과적으로 처리할 수 있는가?
RQ3중심 투표 기반의 평행 이동 추정이 직접적인 3D 좌표 회귀에 비해 가려짐에 대한 강건성을 향상시키는가?
RQ4OccludedLINEMOD 및 YCB-Video와 같은 도전적인 데이터셋에서 색상 입력만과 RGB-D 입력에서 PoseCNN의 성능 차이는 어떠한가?

주요 결과

PoseCNN은 색상 이미지만으로도 강력한 6D 자세 추정을 달성하며, YCB-Video에서 3D 좌표 회귀 기준선보다 우수합니다.
깊이를 ICP 보정을 통해 도입하면 정확도가 크게 향상되며 RGB-D 기준선을 능가하는 경우가 많습니다.
ShapeMatch-Loss는 대칭 물체를 효과적으로 처리하여 OccludedLINEMOD에서 Eggbox와 Glue의 자세 추정을 개선합니다.
OccludedLINEMOD에서 ICP가 포함된 PoseCNN은 RGB-D 입력을 사용하는 최첨단 방법들을 상회하는 성능을 보이는 경우가 다수 있습니다.
YCB-Video 데이터셋(21개 물체, 133,827 프레임)은 가려짐과 대칭성에 대한 강력한 학습 및 평가를 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.