QUICK REVIEW

[论文解读] Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects

Yang Xiao, Xuchong Qiu|arXiv (Cornell University)|Jun 12, 2019

Robot Manipulation and Learning参考文献 59被引用 36

一句话总结

一个通用、类别无关的姿态估计方法，在给定3D模型的条件下将3D对象姿态与之相关联，使在未见对象类别上也能在不进行额外训练的情况下进行姿态估计。该方法在标准基准上提升性能，并展示对新对象和新数据集的强泛化能力。

ABSTRACT

Most deep pose estimation methods need to be trained for specific object instances or categories. In this work we propose a completely generic deep pose estimation approach, which does not require the network to have been trained on relevant categories, nor objects in a category to have a canonical pose. We believe this is a crucial step to design robotic systems that can interact with new objects in the wild not belonging to a predefined category. Our main insight is to dynamically condition pose estimation with a representation of the 3D shape of the target object. More precisely, we train a Convolutional Neural Network that takes as input both a test image and a 3D model, and outputs the relative 3D pose of the object in the input image with respect to the 3D model. We demonstrate that our method boosts performances for supervised category pose estimation on standard benchmarks, namely Pascal3D+, ObjectNet3D and Pix3D, on which we provide results superior to the state of the art. More importantly, we show that our network trained on everyday man-made objects from ShapeNet generalizes without any additional training to completely new types of 3D objects by providing results on the LINEMOD dataset as well as on natural entities such as animals from ImageNet.

研究动机与目标

为预定义类别或实例之外的对象（野外场景）提供鲁棒的姿态估计的动机。
提出一个深度网络，将姿态估计与目标对象的3D模型相条件化。
证明基于形状条件的姿态估计在已知类别上能提升准确性，并能泛化到新对象。
展示3D形状的点云和多视图渲染都可用于编码用于姿态预测的形状信息。

提出的方法

Two-branch network processes: (1) RGB image through a CNN (ResNet-18) and (2) 3D shape through either PointNet or multi-view rendered images.
混合分类与回归损失，用于预测欧拉角的bins，以及用于方位角、仰角和平面内旋转的bin内偏移。
Angles are discretized into L_theta bins with corresponding classification scores and regression offsets (Huber loss).
数据增强包括形状旋转扰动，以降低对规范方向的过拟合。
训练使用带阶段学习率的Adam；用带SUN397背景的合成ShapeNet数据进行训练；在Pascal3D+、ObjectNet3D、Pix3D和LINEMOD上进行测试。
Shape encoders: (a) PointNet for point clouds; (b) multi-view CNN using rendered views around the object; weights shared across viewpoints.

实验结果

研究问题

RQ1深度姿态估计器是否能够在给定3D对象模型的条件下学习类别无关的视点估计？
RQ2纳入精确或近似的3D形状信息是否提升对已知类别的姿态估计性能？
RQ3该方法对新类别和全然未见对象类型的泛化能力如何？
RQ4使用多视图形状表示相对于单视图或点云编码的影响是什么？

主要发现

使用3D形状信息（点云或多视图渲染）在跨数据集上显著提升了相较于无形状信息基线的姿态估计性能。
多视图表示在形状输入方面通常优于点云编码。
即使仅在合成数据上训练，该方法在Pascal3D+、ObjectNet3D和Pix3D上也实现了有竞争力甚至更优的结果。
该方法在LINEMOD上无需针对对象的训练即可提供有意义的粗姿态估计，便于后续有效的细化（例如DeepIM）。
在训练过程中对对象形状方向进行随机化，减少对规范姿态的过拟合，并提升对未见形状的鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。