QUICK REVIEW

[论文解读] OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maxim Kolodiazhnyi, Анна Воронцова|arXiv (Cornell University)|Nov 24, 2023

3D Shape Modeling and Analysis被引用 9

一句话总结

OneFormer3D 提出一个基于单一 transformer 的框架，将语义、实例和全景分割在3D点云上统一，在全景数据上端到端训练，并在 ScanNet、ScanNet200 和 S3DIS 上实现了最先进的结果。

ABSTRACT

Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.

研究动机与目标

证明语义、实例和全景3D分割可以用单一模型联合解决。
引入具有语义和实例查询的查询解码器以实现统一的掩模生成。
开发一个查询选择和解耦匹配策略以稳定并加速训练。
展示在 ScanNet、ScanNet200 和 S3DIS 上端到端在全景数据上训练的最先进性能。

提出的方法

使用稀疏3D U-Net骨干网络提取逐点特征。
应用灵活的池化（超点或体素）以降低 transformer 解码器的计算量。
结合带有语义和实例查询的 transformer 解码器以生成掩模的学习核。
采用解耦匹配方案，通过将超点直接与真实对象建立关联来避免 Hungarian 匹配。
使用包括实例分类、掩模 BCE 和 Dice 损失，以及语义 BCE 损失在内的组合损失进行训练。

实验结果

研究问题

RQ1单一统一模型能否有效解决语义、实例和全景3D分割？
RQ2与任务特定模型相比，语义和实例查询的联合训练是否提升了3D分割性能？
RQ3查询选择和解耦匹配是否能稳定训练并提高基于3D transformer 的分割准确性？
RQ4在使用 OneFormer3D 时，在 ScanNet、ScanNet200 和 S3DIS 上能获得哪些最先进的性能提升？

主要发现

在 ScanNet、ScanNet200 和 S3DIS 的3D语义、实例和全景分割中实现最先进的结果。
在 ScanNet 验证集上，OneFormer3D 在实例、语义和全景任务上均达到最高分，超越基线如 SPFormer 和 Mask3D。
展示了一种新颖的线性时间关联的解耦匹配，替代传统的 Hungarian 算法。
展示单一模型的联合训练带来的显著收益，包括语义 mIoU 的提升和鲁棒的全景性能。
预训练（真实和合成）以及移除超点池化可能影响性能，大规模预训练带来显著收益。）

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。