[论文解读] SPLATNet: Sparse Lattice Networks for Point Cloud Processing
SPLATNet 在置换栅格上引入稀疏双边卷积层,以高效处理三维点云并实现 2D-3D 信息的联合融合,在无需体素化的情况下处理稀疏性并达到最先进的分割性能。
We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naively applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.
研究动机与目标
- Motivate direct processing of irregular point clouds without voxel pre-processing.
- Develop a flexible lattice-based convolution framework enabling adjustable receptive fields and spatially-aware features.
- Enable efficient sparse computation using hashing to operate only on occupied lattice sites.
- Incorporate joint 2D-3D processing by mapping image features into the same lattice as the point cloud.
- Demonstrate superior segmentation performance on RueMonge2014 and ShapeNet datasets.
提出的方法
- Utilize Bilateral Convolution Layers (BCLs) as building blocks to map input features onto a high-dimensional sparse lattice (permutohedral lattice).
- Splat -> Convolve -> Slice operations transfer features between input points and lattice, with learnable high-dimensional filters.
- Construct SPLATNet_3D by stacking multiple BCLs with increasing receptive fields via progressively coarser lattice scales, concatenating their outputs, and using 1x1 convolutions for final per-point predictions.
- Extend to SPLATNet_2D-3D to fuse 2D CNN features with 3D point cloud data by splatting 2D features onto a 3D lattice and slicing onto the point cloud, followed by 2D-3D fusion through concatenation and 1x1 convolutions.
- Provide an end-to-end trainable framework allowing input/output lattice feature flexibility (L_in, L_out) and different lattice scales Lambda to control receptive fields.
实验结果
研究问题
- RQ1Can sparse high-dimensional bilateral filtering on a permutohedral lattice process irregular point clouds efficiently without voxelization?
- RQ2Does mapping point clouds (and optionally 2D images) onto a shared lattice enable effective end-to-end learning for 3D segmentation and 2D-3D fusion?
- RQ3What is the impact of lattice scale choices on receptive field and performance for 3D point cloud tasks?
- RQ4Does joint 2D-3D processing improve segmentation performance over purely 3D or purely 2D approaches?
- RQ5How do SPLATNet variants perform on RueMonge2014 facade segmentation and ShapeNet part segmentation compared to state-of-the-art methods?
主要发现
| 方法 | 平均 IoU | 运行时间(分钟) |
|---|---|---|
| OctNet | 59.2 | - |
| Autocontext_3D | 54.4 | 16 |
| SPLATNet_3D | 65.4 | 0.06 |
| Autocontext_2D-3D | 62.9 | 87 |
| SPLATNet_2D-3D | 69.8 | 1.20 |
| DeepLab_2D | 69.3 | 0.84 |
| SPLATNet_2D-3D (2D-3D Image labeling) | 70.6 | 4.34 |
- SPLATNet_3D achieves 65.4 IoU on RueMonge2014 3D point cloud labeling, outperforming OctNet (59.2 IoU).
- SPLATNet_2D-3D achieves 69.8 IoU on RueMonge2014 3D labeling with 2D-3D data, outperforming prior state-of-the-art (Autocontext2D-3D at 62.9 IoU).
- For multi-view image labeling on RueMonge2014, DeepLab 2D + SPLATNet_2D-3D reaches 70.6 IoU, higher than baselines.
- On ShapeNet part segmentation, SPLATNet_3D reaches class average IoU of 82.0 and instance average IoU of 84.6; SPLATNet_2D-3D reaches class average IoU 83.7 and instance average IoU 85.4, surpassing prior methods.
- SPLATNet_2D-3D demonstrates a significant end-to-end 2D-3D fusion advantage, with a trade-off in runtime due to 2D network processing over many high-resolution views.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。