QUICK REVIEW

[论文解读] CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning

Fabian Manhardt, Gu Wang|arXiv (Cornell University)|Mar 12, 2020

Robotics and Sensor-Based Localization参考文献 70被引用 35

一句话总结

CPS++ 引入端对端可微分管线的类级单目6D位姿与度量形状估计，结合自监督扩展以弥合合成与现实域差距。它达到最先进的位姿精度并为每个对象类别提供可学习的3D形状表示。

ABSTRACT

Contemporary monocular 6D pose estimation methods can only cope with a handful of object instances. This naturally hampers possible applications as, for instance, robots seamlessly integrated in everyday processes necessarily require the ability to work with hundreds of different objects. To tackle this problem of immanent practical relevance, we propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval. Unfortunately, acquiring adequate annotations is very time-consuming and labor intensive. This is especially true for class-level 6D pose estimation, as one is required to create a highly detailed reconstruction for all objects and then annotate each object and scene using these models. To overcome this shortcoming, we additionally propose the idea of synthetic-to-real domain transfer for class-level 6D poses by means of self-supervised learning, which removes the burden of collecting numerous manual annotations. In essence, after training our proposed method fully supervised with synthetic data, we leverage recent advances in differentiable rendering to self-supervise the model with unannotated real RGB-D data to improve latter inference. We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.

研究动机与目标

激发对可扩展到实例无关模型之外的类级单目6D位姿估计的需求。
提出 CPS，在单张RGB图像中联合估计6D位姿和度量形状。
引入基于 AtlasNet 的每个类别形状潜在空间以重建对象形状。
实现自监督（合成到真实）域转移，以降低标注负担。

提出的方法

使用 RetinaNet 检测2D感兴趣区域并使用 RoIAlign 为每个检测提取特征。
对每个检测，预测 allocentric 旋转 q_a、2D 图像质心、深度 z、度量大小(w,h,l)，以及每个类别的32维形状潜在 e。
用按类别训练的 AtlasNet 编解码器表示形状，并预测从类别平均形状 m_c 的形状偏移。
反投影到3D以获取3D位姿，并使用基于 Chamfer 距离的可微分3D对齐损失，在预测点云与真实点云之间。
引入在3D空间中联合优化位姿和形状参数的3D点云对齐损失。
实现一个自监督扩展（CPS++），从预测网格渲染可微分的RGB-D对，并对齐真实未标注数据的几何与掩模。
在合成数据监督的基础上，结合 Self6D 启发的自监督以弥合域间差距。

实验结果

研究问题

RQ1单目网络能否在训练时未见的类别级对象上估计6D位姿和度量形状？
RQ2结合可微分3D对齐损失的端到端训练是否能提升位姿精度和形状质量？
RQ3利用真实未标注的RGB-D数据进行自监督学习是否能弥合合成到真实在类级6D位姿估计中的域差距？
RQ4是否可行学习每个类别的3D形状潜在变量，使其在类别内的实例变异间具备泛化能力？

主要发现

提出一个新颖的 CPS 框架，联合预测6D位姿、对象缩放和类别特定形状潜在量，使从单张RGB图像实现3D形状重建成为可能。
引入一个可微分的3D对齐损失，通过在3D空间直接优化对齐来提升位姿精度。
展示自监督扩展 CPS++，利用真实未标注的RGB-D数据来缩小合成到真实的差距。
表明基于 AtlasNet 的每类别形状空间可以实现度量形状估计，并且可以正则化以保持在学习到的形状分布内。
收集并公开超过3万条真实RGB-D样本，以促进类级6D位姿估计中的自监督学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。