[论文解读] Submanifold Sparse Convolutional Networks
本文提出保留稀疏性的稀疏卷积(SC 和 VSC),用于构建深度的子流形稀疏卷积网络,在大约是密集等价物的计算量和内存需求的50% 时,达到最先进的性能。
Convolutional network are the de-facto standard for analysing spatio-temporal data such as images, videos, 3D shapes, etc. Whilst some of this data is naturally dense (for instance, photos), many other data sources are inherently sparse. Examples include pen-strokes forming on a piece of paper, or (colored) 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce a sparse convolutional operation tailored to processing sparse data that differs from prior work on sparse convolutional networks in that it operates strictly on submanifolds, rather than "dilating" the observation with every layer in the network. Our empirical analysis of the resulting submanifold sparse convolutional networks shows that they perform on par with state-of-the-art methods whilst requiring substantially less computation.
研究动机与目标
- Motivate efficient processing of inherently sparse spatio-temporal data (e.g., pen strokes, LiDAR point clouds).
- Develop convolution operators that avoid dilation of sparsity across layers.
- Build deep network architectures (VGG, ResNet, DenseNet variants) using sparse convolutions that maintain sparsity.
- Demonstrate computational and memory savings while preserving accuracy on benchmark datasets.
提出的方法
- Define sparse convolution (SC) and valid sparse convolution (VSC) that ignore ground-state values and restrict active sites to those present in the input.
- Use VSC for most layers to keep the sparsity pattern fixed through the network.
- Combine VSC with strided SC convolutions and sparse pooling to form submanifold networks (VGG, ResNet, DenseNet variants).
- Represent computation with a hash table of active sites and a small feature matrix; build a rule book to map input to output sites efficiently on GPUs.
- Provide deconvolution (DC) as an inverse of SC to reconnect spatial structures.
实验结果
研究问题
- RQ1Can sparse convolutions be designed to preserve the active site pattern (submanifold) across layers without dilating sparsity?
- RQ2Do SC and VSC enable deeper architectures with comparable accuracy but reduced computation and memory for sparse data?
- RQ3How do submanifold networks compare to dense and other sparse approaches on 2D handwritten data and 3D shape datasets?
- RQ4What are practical implementations and efficiencies of hash-table based sparse convolutions on modern hardware?
主要发现
- SC and VSC achieve state-of-the-art performance on sparse data benchmarks while reducing computation and memory by about 50%.
- Ground-state discarding (setting inactive sites to zero) does not degrade accuracy compared to dense convolutions on CASIA handwriting data.
- VSC (valid sparse convolution) maintains sparsity, enabling deeper networks with substantial computational savings and minimal accuracy loss.
- Submanifold architectures (VGG/ResNet/DenseNet variants) built with SC/VSC outperform or match dense baselines with markedly lower FLOPs and activations.
- Experiments on CASIA and ModelNet demonstrate practical efficiency and competitive accuracy against dense networks.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。