QUICK REVIEW

[论文解读] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

Angela Dai, Anne Lynn S. Chang|arXiv (Cornell University)|Feb 14, 2017

Robotics and Sensor-Based Localization参考文献 27被引用 526

一句话总结

ScanNet 引入了一个大规模 RGB-D 数据集，包含 1513 次扫描的密集 3D 重建、相机位姿和实例级语义标注，支持用于 3D 场景理解任务的监督学习以及新的基准测试。

ABSTRACT

A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at http://www.scan-net.org.

研究动机与目标

证明众包、密集的 RGB-D 数据能够推动 3D 场景理解研究的扩展。
提供一个工作流和开放框架，便于非专业人员进行轻松捕获、自动重建和语义标注。
展示 ScanNet 在 3D 物体分类、语义体素标注和 CAD 模型检索方面实现了业界领先的性能。
为研究社区提供大规模的基准测试和开源工具。

提出的方法

设计一个使用通用硬件（在 iPad 上的 Structure sensor）的可扩展 RGB-D 捕获系统，并使用棋盘格模式进行标定。
使用基于 BundleFusion 的密集重建来获取相机位姿，并利用 TSDF 的高分辨率网格。
自动将重建对齐到一个公共坐标系，并提取定向、清晰的网格。
通过 WebGL 注释界面进行实例级语义标注并通过辅助检索/放置界面对 3D CAD 模型进行对齐。
创建三项基准任务（3D 物体分类、语义体素标注、CAD 模型检索），并提供 train/test 拆分和评估指标。
提供用于密集 RGB-D 重建的开源获取与标注框架。

实验结果

研究问题

RQ1一个面向初学者的 RGB-D 捕获管线是否能够支持对真实世界室内场景进行可扩展、丰富注释的 3D 重建？
RQ2利用 ScanNet 数据是否能够在基于深度学习的 3D 场景理解任务中提升对象分类、体素标注和 CAD 模型检索的性能？

主要发现

The ScanNet 数据集包含 1513 RGB-D 扫描，来自 707 个不同空间，拥有 2.5M RGB-D 帧、相机位姿、表面重建、带纹理的网格，以及密集的实例级语义标注。
一个众包工作流对表面进行实例级类别标注并将 CAD 模型与重建对齐，从而实现可扩展的 3D 注释（在 52 次扫描中，共 681 个 CAD 模型实例，跨 107 次标注）。
Semantic voxel labeling on ScanNet reaches 73.0% voxel-class accuracy on test scenes using geometry alone (no color).
3D object classification benchmarks show improved transfer when training on ScanNet data (especially when combined with ShapeNet) versus synthetic data alone; mixing real ScanNet with ShapeNet improves performance.
3D model retrieval shows that joint training on ShapeNet and ScanNet produces strong embedding performance for real-to-synthetic model retrieval.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。