QUICK REVIEW

[论文解读] SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters

Yifan Xu, Tianqi Fan|arXiv (Cornell University)|Mar 30, 2018

3D Shape Modeling and Analysis参考文献 19被引用 76

一句话总结

SpiderCNN 引入 SpiderConv，这是一个点集卷积，使用参数化滤波器结合步进和泰勒组件，从不规则的 3D 点云中学习，在 ModelNet40 上达到最先进的结果。

ABSTRACT

Deep neural networks have enjoyed remarkable success for various vision tasks, however it remains challenging to apply CNNs to domains lacking a regular underlying structures such as 3D point clouds. Towards this we propose a novel convolutional architecture, termed SpiderCNN, to efficiently extract geometric features from point clouds. SpiderCNN is comprised of units called SpiderConv, which extend convolutional operations from regular grids to irregular point sets that can be embedded in R^n, by parametrizing a family of convolutional filters. We design the filter as a product of a simple step function that captures local geodesic information and a Taylor polynomial that ensures the expressiveness. SpiderCNN inherits the multi-scale hierarchical architecture from classical CNNs, which allows it to extract semantic deep features. Experiments on ModelNet40 demonstrate that SpiderCNN achieves state-of-the-art accuracy 92.4% on standard benchmarks, and shows competitive performance on segmentation task.

研究动机与目标

推动直接在不规则的 3D 点云上进行深度学习，而无需体素化或预定义网格。
提出 SpiderConv 作为在 R^n 的点集上的卷积算子，具有可学习的滤波器。
证明多层 SpiderCNN 能在 3D 分类和分割任务中取得高精度。
显示基于步进的测地信息与泰勒展开的组合能够产生具有表达力的滤波器。
在 ModelNet40 和 ShapeNet-Part 上与最先进的方法进行比较以确立有效性。

提出的方法

将 SpiderConv 定义为在点集上的卷积，滤波器 g_w 在每个点周围的球域内定义。
将 g_w 构造为 Step 基于分量 g^Step 与 Taylor 基于分量 g^Taylor 的乘积（g_w = g^Step_w^S * g^Taylor_w^T）。
使用 K 最近邻（KNN）方案来定义局部性，并用线性映射近似 Step 以提高效率。
用三阶泰勒展开对 g^Taylor 进行参数化以捕捉局部几何信息（例如，含 x^3、y^3、z^3 及其叉乘项的项）。
通过带反向传播的 SGD 训练滤波器 w；将 F ∗ g_w(p) 计算为对邻居的加权求和（权重为学习得到的权重）。
采用多通道、多层的 SpiderCNN，结合 top-k 池化来形成用于分类的全局特征，以及用于分割的逐点特征。

实验结果

研究问题

RQ1Can SpiderConv effectively generalize convolution to irregular point clouds without voxelization?
RQ2Do parameterized filters combining step/geodesic information and Taylor expansions provide sufficient expressiveness for 3D geometric features?
RQ3How does SpiderCNN perform on standard 3D benchmarks for classification and segmentation compared to prior methods?
RQ4What architectural choices (K in KNN, number of Taylor terms, pooling strategy) maximize performance on ModelNet40 and ShapeNet-Part?

主要发现

SpiderCNN with a 4-layer architecture achieves 92.4% accuracy on ModelNet40 when using 1024 points with normals.
SpiderCNN+PointNet achieves 92.2% on ModelNet40, outperforming either method alone.
On SHREC15, SpiderCNN (4-layer) reaches 95.8% accuracy, outperforming several baselines.
In ShapeNet-Part segmentation, SpiderCNN attains a mean IoU of 85.24% across 16 categories, competitive with strong baselines.
Top-2 pooling preserves more geometric detail than max-pooling, contributing to higher accuracy (92.4% vs 92.0% in 4-layer SpiderCNN).
Experiments show MLP-based filters underperform compared to the Taylor+Step filter design (Taylor outperforming various MLP configurations).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。