QUICK REVIEW

[论文解读] Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu, Haotian Tang|arXiv (Cornell University)|Jul 8, 2019

3D Shape Modeling and Analysis参考文献 49被引用 335

一句话总结

PVCNN 将低分辨率体素分支与高分辨率点分支相结合，以在保持高精度的同时实现更快、内存高效的三维深度学习。

ABSTRACT

We present Point-Voxel CNN (PVCNN) for efficient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based NN models. However, both approaches are computationally inefficient. The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution. As for point-based networks, up to 80% of the time is wasted on structuring the sparse data which have rather poor memory locality, not on the actual feature extraction. In this paper, we propose PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to reduce the irregular, sparse data access and improve the locality. Our PVCNN model is both memory and computation efficient. Evaluated on semantic and part segmentation datasets, it achieves much higher accuracy than the voxel-based baseline with 10x GPU memory reduction; it also outperforms the state-of-the-art point-based models with 7x measured speedup on average. Remarkably, the narrower version of PVCNN achieves 2x speedup over PointNet (an extremely efficient model) on part and scene segmentation benchmarks with much higher accuracy. We validate the general effectiveness of PVCNN on 3D object detection: by replacing the primitives in Frustrum PointNet with PVConv, it outperforms Frustrum PointNet++ by 2.4% mAP on average with 1.5x measured speedup and GPU memory reduction.

研究动机与目标

在边缘设备上，由于内存和延迟限制，需实现高效的三维深度学习的动机。
提出一种混合的 PVConv 基元，将基于体素的处理与基于点的处理融合，以降低内存占用并提升数据局部性。
证明与纯体素或纯点模型相比，PVCNN 在多个三维任务上在内存和延迟更低的同时实现更高的准确性。

提出的方法

引入带有两个分支的点-体素卷积（PVConv）：一个用于粗粒度邻域聚合的基于体素的分支，和一个用于细粒度特征的高分辨率点基分支。
基于体素的分支将归一化点进行体素化为低分辨率网格，应用3D卷积，并通过三线性插值去体素化以与点特征融合。
基于点的分支使用多层感知机处理原始点，以保持高分辨率的逐点信息。
通过简单相加融合两分支的特征，以获得最终的点特征。
归一化坐标，执行可微体素化/去体素化，以实现端到端训练。

实验结果

研究问题

RQ1如何在常见三维任务（分割、检测）中在不牺牲准确性的前提下高效处理三维数据？
RQ2混合体素-点方法是否比纯体素或纯点方法降低内存占用并提升数据局部性？
RQ3PVCNN 在 ShapeNet Part、S3DIS 和 KITTI 基准上的性能（准确性、时延、内存）是多少？

主要发现

PVCNN 在精度上高于体素基线，并且显著降低了 GPU 内存（ShapeNet Part 的内存下降约为 10 倍）。
PVCNN 在所测试任务中平均比最先进的基于点的模型快约 7 倍。
窄型 PVCNN 变体在强基线（如 PointNet、SpiderCNN）上实现 2x 至 15x 的加速，且具有竞争力或更高的准确性。
在 ShapeNet Part 上，PVCNN 变体展示了有利的准确率-延迟-内存权衡，例如 1xC 变体在 50.7 ms 延迟和 1.59 GB 内存下达到 86.2 mIoU。
在 S3DIS 室内场景分割中，PVCNN 和 PVCNN++ 的性能优于纯点模型，速度提升可达 8x，内存减少最多 3x；PVCNN++ 在延迟方面优于 PointCNN。
对于三维目标检测（KITTI），PVCNN 变体在测得速度快 1.5x、内存减少方面优于 F-PointNet++，并且完整 PVCNN 产生显著的 mAP 提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。