QUICK REVIEW

[论文解读] SONIC: Spectral Oriented Neural Invariant Convolutions

Gijs Joppe Moens, Regina G. H. Beets‐Tan|arXiv (Cornell University)|Jan 27, 2026

Face Recognition and Perception被引用 0

一句话总结

SONIC 引入一个连续谱、方向感知、低秩卷积算子，获得全局感受野并以更少参数提高鲁棒性，在合成、医学和自然图像基准上与 CNN/ViT 及先前的谱方法相媲美或超越。

ABSTRACT

Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches, which limits their ability to capture global context or long-range dependencies without very deep architectures. Vision Transformers (ViTs), in turn, provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size. Bridging these limitations requires a representation that is both structured and global. We introduce SONIC (Spectral Oriented Neural Invariant Convolutions), a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components. These components define smooth responses across the full frequency domain, yielding global receptive fields and filters that adapt naturally across resolutions. Across synthetic benchmarks, large-scale image classification, and 3D medical datasets, SONIC shows improved robustness to geometric transformations, noise, and resolution shifts, and matches or exceeds convolutional, attention-based, and prior spectral architectures with an order of magnitude fewer parameters. These results demonstrate that continuous, orientation-aware spectral parameterisations provide a principled and scalable alternative to conventional spatial and spectral operators.

研究动机与目标

推动对长程上下文与对固定局部卷积核之外的鲁棒感知的需求，区别于 CNN 的局部性假设与 ViT 的空间归纳偏置不足。
提出一个全局、分辨率不变、参数高效的连续谱算子。
开发一个结构化、方向感知的谱参数化，具备共享模态和低秩混合。
在多样数据集上展示对几何变换、噪声与分辨率变化的鲁棒性。
在合成基准、三维医学影像与 ImageNet 规模设置中评估可扩展性与有效性。

提出的方法

通过可学习的连续谱符号 bHθ(ω) 定义一个连续谱算子。
将谱响应分解为 M 个共享的方向选择模态 Tm(ω)，其参数为 vm（方向）、sm（尺度）、am（复阻尼/振荡）和 τm（横向衰减）。
构造 bHk,c(ω) = ∑m Ckm Tm(ω) Bmc，以实现低秩、模态为基的谱表示。
在频域进行滤波 yk(ω) = ∑c bHk,c(ω) bxc(ω)，再通过一个残差非线性块变换回时域。
通过物理单位预处理（˜vm、ˆvm）对方向进行归一化以实现分辨率不变性。
通过基于 FFT 的谱前向/反向传播实现，复杂度为 O((C+K)N log N + M(C+K)N）。

实验结果

研究问题

RQ1SONIC 能否在显著较少参数的情况下实现全球感受野，与标准 CNN 或 ViT 相比？
RQ2SONIC 对几何变换、噪声和分辨率变化在不同领域的鲁棒性如何？
RQ3在三维医学图像分割任务中，SONIC 与最先进方法在准确性与效率方面的表现如何？
RQ4在医学成像的外部验证和跨扫描仪变异中，SONIC 是否维持性能？
RQ5与传统算子相比，谱参数化的丰富性与计算/内存开销之间的权衡如何？

主要发现

在 SynthShape 上，SONIC 相较 CNN/ViT 基线和先前谱模型展现出对畸变和长程依赖的更强鲁棒性。
在 HalliGalli 中，SONIC 在单个块内独特地解决了严格的长程依赖任务，展示其全球感受野能力。
KiTS 与 ACDC 三维医学分割：SONIC 的性能与最先进方法相匹配或超越，同时参数显著更少（约为强基线的不到 10%）。
外部验证（Prostate158 与 PROMIS）显示检测指标提升且可训练参数显著减少（例如 SonicNet 2.59M vs nnU-Net 31.20M；Prostate158 的 AUROC 0.841 vs 0.814）。
ImageNet 实验中使用 ResNet-50 变体，显示 SONIC 在适度的计算/内存开销下仍具竞争力的准确性（如 ResNet-50 Sonic ~60.01 Top-1，0.81 GFLOPs，与其他谱算子相比）。
在各类任务中，SONIC 均保持或提升性能，同时提供全局感受野和分辨率不变性，且参数量显著降低。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。