QUICK REVIEW

[论文解读] Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset

Yang Fu, Xiaolong Wang|arXiv (Cornell University)|Jun 30, 2022

Human Pose and Action Recognition被引用 29

一句话总结

论文引入 Wild6D，一个大型未标注的 RGBD 视频数据集，以及 RePoNet，一个半监督模型，通过将合成数据与来自真实世界视频的基于轮廓的监督相结合，学习类别级别的 6D 姿态和形状。它在野外无需真实数据的 3D 注释就实现了很强的泛化能力。

ABSTRACT

6D object pose estimation is one of the fundamental problems in computer vision and robotics research. While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely category-level 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data. In this paper, we collect Wild6D, a new unlabeled RGBD object video dataset with diverse instances and backgrounds. We utilize this data to generalize category-level 6D object pose estimation in the wild with semi-supervised learning. We propose a new model, called Rendering for Pose estimation network RePoNet, that is jointly trained using the free ground-truths with the synthetic data, and a silhouette matching objective function on the real-world data. Without using any 3D annotations on real data, our method outperforms state-of-the-art methods on the previous dataset and our Wild6D test set (with manual annotations for evaluation) by a large margin. Project page with Wild6D data: https://oasisyang.github.io/semi-pose .

研究动机与目标

在无限制、真实世界场景中以有限注释为条件，推动类别级别的 6D 姿态估计。
引入大规模未标注的 RGBD 视频数据集（Wild6D）以提升泛化能力。
提出 RePoNet，这是一种双分支网络，结合微分渲染，联合估计 6D 姿态和 3D 形状。
利用合成的真实标签和真实世界轮廓监督来端到端训练，无需真实的 3D 标签。
在野外对象和现有数据集上证明相对于基线的显著性能提升。

提出的方法

使用 NOCS 作为中间表示，将 RGBD 特征映射到 6D 姿态，而无需实例 CAD 模型。
采用姿态网络预测 NOCS 映射，形状网络对类别级形状先验进行变形。
集成可微分渲染模块以生成对象遮罩，与前景蒙版进行轮廓匹配。
使用半监督目标函数进行训练，结合合成地面真实监督和真实数据上的轮廓相关损失。
利用隐函数 Φ_nocs 从 RGBD 特征和点坐标预测每一点的 NOCS 坐标。
采用解耦的 6D 姿态损失（旋转、平移、比例）和基于 Chamfer 的形状重建损失来监督形状。

实验结果

研究问题

RQ1半监督方法是否能在没有真实 3D 注释的情况下，将类别级别的 6D 姿态估计有效泛化到野外对象？
RQ2通过对未标注的真实 RGBD 视频进行轮廓匹配，结合合成数据，是否比完全监督基线在姿态和形状估计上有改进？
RQ3像 NOCS 这样的中间表示和隐式形状变形如何影响对多样化对象实例的泛化？
RQ4可微分渲染和轮廓匹配对从真实世界未标注数据学习有何影响？

主要发现

RePoNet 在半监督学习下在野外对象（Wild6D 测试集）上持续优于基线。
通过轮廓匹配使用真实世界数据而不具有 3D 地面真相的情况，其性能与完全标注的方法相当。
Wild6D 比以前的数据集规模更大、更具多样性，使得类别级姿态估计在野外具有更好的泛化能力。
隐式 NOCS 映射与对类别网格的逐点变形结合可微分渲染，提供了来自未标注数据的强有力监督。
使用 Wild6D 与 REAL275/CAMERA25 数据进行半监督训练，在 REAL275 上接近或达到完全监督的性能，在 Wild6D 上优于基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。