[论文解读] No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects
SPD-Conv 通过用一个 space-to-depth 下采样,随后非步长卷积,替代带步长的卷积和池化,在低分辨率图像和小目标上提升性能;已应用于 YOLOv5 和 ResNet 变体,开源代码可用。
Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.
研究动机与目标
- 识别传统 CNN 在低分辨率图像和小物体上由于带步长下采样和池化所带来的性能限制。
- 提出 SPD-Conv 作为替代带步长卷积和池化、可在多种架构中通用的构建模块。
- 展示 SPD-Conv 在目标检测和图像分类任务中的有效性。
- 展示 SPD-Conv 可以集成到流行的框架中,并提供用于复现的开源代码。
提出的方法
- Introduce SPD-Conv: a space-to-depth (SPD) layer followed by a non-strided convolution.
- SPD downsamples feature maps while preserving information by rearranging spatial data into the channel dimension.
- Follow SPD with a non-strided convolution to reduce channel dimensionality and learn discriminative features.
- Replace all strided convolutions and pooling layers with SPD-Conv in existing architectures (e.g., YOLOv5, ResNet).
- Provide scaling strategies (width and depth) to create nano, small, medium, and large SPD-empowered models.
实验结果
研究问题
- RQ1Does SPD-Conv preserve discriminative information during downsampling compared to traditional strided downsampling?
- RQ2Can SPD-Conv improve performance on downstream tasks like object detection and image classification, particularly for small objects and low-resolution images?
- RQ3How can SPD-Conv be integrated into existing architectures (e.g., YOLOv5, ResNet) and scaled across model sizes?
- RQ4Is SPD-Conv easily adoptable within common deep learning frameworks (PyTorch, TensorFlow) and training pipelines?
主要发现
- SPD-Conv replaces strided convolutions and pooling, downsampling feature maps without losing learnable information.
- Applying SPD-Conv to YOLOv5-SPD and ResNet-SPD yields improved performance, especially for small objects and low-resolution images.
- In COCO val2017, nano YOLOv5-SPD-n achieves AP_S improvements up to 13.15 percentage points over the runner-up in AP_S.
- In COCO val2017, small models show notable AP and AP_S gains with SPD-Conv across variants (e.g., YOLOv5-SPD-s and m).
- On COCO test-dev2017, SPD-Conv models maintain leading AP_S across nano, small, and large categories, with competitive AP compared to transfer-learned baselines.
- For image classification (Tiny ImageNet and CIFAR-10), ResNet18-SPD and ResNet50-SPD outperform baselines, achieving higher top-1 accuracy on their respective datasets.]
- table_headers: []
- table_rows: []} ) {
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。