Skip to main content
QUICK REVIEW

[论文解读] NPNet: A Non-Parametric Network with Adaptive Gaussian-Fourier Positional Encoding for 3D Classification and Segmentation

Mohammad Saeid, Amir Salarpour|arXiv (Cornell University)|Jan 31, 2026
3D Shape Modeling and Analysis被引用 0
一句话总结

NPNet 是一个用于三维点云分类与分割的完全非参数框架,采用自适应高斯-傅里叶位置编码和记忆库推理,在高效性与强小样本性能方面具有竞争力,同时无需学习权重。

ABSTRACT

We present NPNet, a fully non-parametric approach for 3D point-cloud classification and part segmentation. NPNet contains no learned weights; instead, it builds point features using deterministic operators such as farthest point sampling, k-nearest neighbors, and pooling. Our key idea is an adaptive Gaussian-Fourier positional encoding whose bandwidth and Gaussian-cosine mixing are chosen from the input geometry, helping the method remain stable across different scales and sampling densities. For segmentation, we additionally incorporate fixed-frequency Fourier features to provide global context alongside the adaptive encoding. Across ModelNet40/ModelNet-R, ScanObjectNN, and ShapeNetPart, NPNet achieves strong performance among non-parametric baselines, and it is particularly effective in few-shot settings on ModelNet40. NPNet also offers favorable memory use and inference time compared to prior non-parametric methods

研究动机与目标

  • Develop a training-free non-parametric architecture for 3D point-cloud classification and segmentation.
  • Introduce adaptive Gaussian–Fourier positional encoding that adapts to input geometry.
  • Augment segmentation with fixed-frequency Fourier features to provide global context.
  • Demonstrate competitive accuracy and efficiency against non-parametric baselines and competitiveness with parametric models.
  • Assess few-shot performance and deployment implications of a training-free pipeline.

提出的方法

  • Use deterministic geometric operators (farthest point sampling, k-NN grouping, pooling) to build multi-scale point features without learned weights.
  • Propose adaptive Gaussian–Fourier encoding that selects bandwidth and Gaussian–cosine mixing from input statistics (sigma_g) with blending parameter lambda.
  • For segmentation, add fixed-frequency Fourier features to form a hybrid position encoding for global context.
  • Encode training shapes into a memory bank and perform similarity-based inference for classification; for segmentation, use part prototypes and nearest-prototype matching.
  • Inference is memory-bank based and training-free: build banks once, then query with nearest-prototype style matching.
Figure 2 : Adaptive Gaussian–Fourier positional encoding. The encoding adapts bandwidth $\sigma$ and mixing coefficient $\lambda$ from input geometry; an additional fixed-frequency Fourier branch provides global context for segmentation.
Figure 2 : Adaptive Gaussian–Fourier positional encoding. The encoding adapts bandwidth $\sigma$ and mixing coefficient $\lambda$ from input geometry; an additional fixed-frequency Fourier branch provides global context for segmentation.

实验结果

研究问题

  • RQ1Can a fully non-parametric pipeline match or exceed parametric methods on standard 3D point-cloud benchmarks?
  • RQ2Does an input-adaptive Gaussian–Fourier positional encoding improve stability and transfer across varying densities and scales?
  • RQ3What is the impact of fixed-frequency Fourier features on segmentation performance and global context?
  • RQ4What are the memory, time, and computational trade-offs of NPNet compared to prior non-parametric methods and parametric networks, especially in few-shot settings?

主要发现

  • On ModelNet40, NPNet achieves 85.45% accuracy with 0.0M parameters and 0.0 GFLOPs.
  • On ModelNet-R, NPNet achieves 85.65% accuracy with 0.0M parameters and 0.0 GFLOPs.
  • On ScanObjectNN, NPNet attains 86.1% OBJ-BG, 86.1% OBJ-ONLY, and 84.9% PB-T50-RS (non-parametric baseline leadership on OBJ-BG and OBJ-ONLY).
  • On ShapeNetPart, NPNet achieves 73.56% instance mIoU with the hybrid encoding.
  • In few-shot ModelNet40, NPNet achieves 92.0% (5-way 10-shot) and 93.2% (5-way 20-shot); 82.5% (10-way 10-shot) and 87.6% (10-way 20-shot).
  • Efficiency figures show NPNet with ModelNet40: 0.0021 GFLOPs, 99.1 MB memory, 3.86 ms/sample; ShapeNetPart: 0.0045 GFLOPs, 256.4 MB, 5.63 ms/sample.
Figure 3 : Stage block used in NPNet. FPS selects centroids, $k$ -NN groups local neighborhoods, positional encoding modulates features, and mean/max pooling produces a stage descriptor; concatenating stages forms a multi-scale representation.
Figure 3 : Stage block used in NPNet. FPS selects centroids, $k$ -NN groups local neighborhoods, positional encoding modulates features, and mean/max pooling produces a stage descriptor; concatenating stages forms a multi-scale representation.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。