Skip to main content
QUICK REVIEW

[论文解读] Wavelet Convolutions for Large Receptive Fields

Shahaf E. Finder, Roy Amoyal|arXiv (Cornell University)|Jul 8, 2024
Image and Signal Denoising Methods被引用 10
一句话总结

本文提出将 wavelet transforms 融入 convolutional networks ,以在保持效率和多频表示的同时实现大 receptive fields。

ABSTRACT

In recent years, there have been attempts to increase the kernel size of Convolutional Neural Nets (CNNs) to mimic the global receptive field of Vision Transformers' (ViTs) self-attention blocks. That approach, however, quickly hit an upper bound and saturated way before achieving a global receptive field. In this work, we demonstrate that by leveraging the Wavelet Transform (WT), it is, in fact, possible to obtain very large receptive fields without suffering from over-parameterization, e.g., for a $k imes k$ receptive field, the number of trainable parameters in the proposed method grows only logarithmically with $k$. The proposed layer, named WTConv, can be used as a drop-in replacement in existing architectures, results in an effective multi-frequency response, and scales gracefully with the size of the receptive field. We demonstrate the effectiveness of the WTConv layer within ConvNeXt and MobileNetV2 architectures for image classification, as well as backbones for downstream tasks, and show it yields additional properties such as robustness to image corruption and an increased response to shapes over textures. Our code is available at https://github.com/BGU-CS-VIL/WTConv.

研究动机与目标

  • 在不造成过度计算的情况下推动在 CNNs 中实现大 receptive fields 的必要性。
  • 引入基于 wavelet 的卷积方法以捕获多频信息。
  • 展示小波卷积如何在保持效率的同时扩大感受野。

提出的方法

  • 将 wavelet transform 的概念整合到卷积神经网络中以创建基于 wavelet 的特征图。
  • 利用 wavelet 内在的多频表示来丰富感受野。
  • 提出在架构或算法层面上用 wavelet convolutions 替代或增强标准卷积的步骤。
  • 讨论训练考量以及在鲁棒性或精度方面的潜在收益。

实验结果

研究问题

  • RQ1基于 wavelet 的卷积是否可以在参数量或计算量不呈二次增长的情况下提供大感受野?
  • RQ2多频表示的 wavelet 是否比传统卷积更有助于特征学习?
  • RQ3在标准视觉任务中,wavelet 卷积的实际收益(如鲁棒性、效率)是什么?

主要发现

  • 提出基于 wavelet 的卷积方法以实现大感受野。
  • 强调通过 wavelets 的多频表示作为核心优势。
  • 论证在效率与相对于标准大内核设计的潜在性能收益方面的优势。
  • 将 wavelet 方法置于更广泛的 CNN 与视觉 transformer 文献背景之中。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。