QUICK REVIEW

[论文解读] Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image

Thanh-Toan Do, Ming Cai|arXiv (Cornell University)|Feb 28, 2018

Advanced Neural Network Applications参考文献 22被引用 129

一句话总结

Deep-6DPose 共同检测、分割并回归来自单个 RGB 图像的对象实例的 6D 姿态，采用李代数表示旋转以实现直接姿态回归，无需后处理。

ABSTRACT

Detecting objects and their 6D poses from only RGB images is an important task for many robotic applications. While deep learning methods have made significant progress in visual object detection and segmentation, the object pose estimation task is still challenging. In this paper, we introduce an end-toend deep learning framework, named Deep-6DPose, that jointly detects, segments, and most importantly recovers 6D poses of object instances from a single RGB image. In particular, we extend the recent state-of-the-art instance segmentation network Mask R-CNN with a novel pose estimation branch to directly regress 6D object poses without any post-refinements. Our key technical contribution is the decoupling of pose parameters into translation and rotation so that the rotation can be regressed via a Lie algebra representation. The resulting pose regression loss is differential and unconstrained, making the training tractable. The experiments on two standard pose benchmarking datasets show that our proposed approach compares favorably with the state-of-the-art RGB-based multi-stage pose estimation methods. Importantly, due to the end-to-end architecture, Deep-6DPose is considerably faster than competing multi-stage methods, offers an inference speed of 10 fps that is well suited for robotic applications.

研究动机与目标

推动直接从 RGB 图像进行端到端的 6D 物体姿态估计，无需后处理。
利用并扩展 Mask R-CNN，在 RoI 内部添加专用的姿态回归头以恢复 6D 姿态。
引入通过 Lie 代数 so(3) 的旋转表示，以实现旋转的无约束回归。
结合投影几何线索和预测的 2D 边界框实现平移恢复。
在标准基于 RGB 的姿态基准上展示处于前沿或具有竞争力的性能，同时实现快速推理。

提出的方法

基于 Mask R-CNN/Faster R-CNN 主干，使用区域提议网络来生成 RoIs。
添加一个新颖的 6D 姿态头，对每个 RoI 回归一个 4D 向量：前 3 个分量是 so(3) 旋转（ Lie algebra ），最后一个分量是 z 平移。
使用 so(3) 通过 Rodrigues 映射表示旋转以获得旋转矩阵。
通过投影几何从预测的 z 分量和边界框中恢复完整的平移（t_x 和 t_y 由 t_z 与内参推导）。
采用多任务损失进行训练，组合分类、盒子回归、掩模分割和姿态回归损失。
姿态分支与类别无关，但可以扩展为类别特定输出。

实验结果

研究问题

RQ1一个端到端的仅RGB网络是否能够在不需要后处理的情况下联合检测、分割并估计 6D 物体姿态？
RQ2使用 Lie 代数 so(3) 表示旋转是否有助于在卷积神经网络中实现稳定的、无约束的回归？
RQ3仅回归 z 平移分量并通过投影恢复 x/y 的影响是什么？
RQ4在标准数据集上，与最先进的基于 RGB 的姿态方法在精度与速度方面，端到端的 Deep-6DPose 如何比较？

主要发现

Deep-6DPose 在 LINEMOD 和 Tejani 等数据集上实现了具有竞争力的 2D 检测/分割准确度，在 IoU 0.5 下检测与分割几乎满分。
在小于 5cm/5° 的姿态准确度条件下，Deep-6DPose 优于 Brachmann 等并且与 BB8 具有竞争力，而 SSD-6D 由于合成训练数据可能更强，但 Deep-6DPose 提供端到端的姿态输出且无需 refinement。
在 LINEMOD 上，Deep-6DPose 的 2D-姿态指标可与 SSD-6D 相媲美，且优于 Brachmann 等；ADD 指标性能比 BB8 高约 2.5%。
在 Tejani 等数据集的多实例场景中，Deep-6DPose 在 IoU 0.5 时实现近乎完美的 2D 检测/分割，平均达到合理的 5cm/5° 和 ADD 分数，对几乎对称的物体存在一些下降。
在 Titan X 上的每张图像推理约 0.1 秒，远快于多阶段方法并且比 BB8 快，同时与 SSD-6D 相竞争。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。