QUICK REVIEW

[论文解读] Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Ionuţ Cosmin Duţă, Li Liu|arXiv (Cornell University)|Jun 20, 2020

Advanced Neural Network Applications参考文献 51被引用 138

一句话总结

PyConv 架构创建一个多尺度核的金字塔，能够在不同空间尺寸和深度处理输入，而不增加参数数量，在分类、分割和相关任务上提升性能。

ABSTRACT

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv

研究动机与目标

解决标准卷积神经网络中固定大小卷积核和有限感受野的局限。
开发一个多尺度、多深度的卷积算子（PyConv），在保持参数高效性的同时。
在图像分类、视频动作识别、目标检测与语义分割等任务中证明 PyConv 的有效性。
提供 PyConvResNet、PyConvHGResNet、PyConvSegNet 等架构，在关键视觉识别基准测试中超越基线。

提出的方法

将 PyConv 定义为一个核的金字塔，随层级空间尺寸增大而深度递减。
使用分组卷积实现 PyConv，以在各层控制核深度并维持与标准卷积等量的参数。
将 PyConv 嵌入残差瓶颈块，形成 PyConvResNet 与 PyConvHGResNet 架构。
提出 PyConvPH（LocalPyConv、GlobalPyConv、Merge 块）用于语义分割，以捕捉局部与全局的多尺度上下文。
对比 ImageNet 和 ADE20K 上与 ResNet 基线的性能，并分析参数/浮点运算预算。

实验结果

研究问题

RQ1在保持与标准卷积相似的参数数量和计算成本的前提下，PyConv 能否提高识别性能？
RQ2将多尺度、多深度核处理整合到 CNN 主干中，是否有助于多种视觉任务（分类、分割、检测、视频）？
RQ3在网络各阶段应如何配置核大小、深度和分组以实现最佳准确性与效率？
RQ4多尺度分割头（PyConvPH）是否能够在 ADE20K 上超越现有的分割头？

主要发现

基于 PyConv 的网络在 ImageNet 上超过 ResNet 基线，同时使用更少的参数和 FLOPs（例如，PyConvResNet-50：Top-1 22.12%，参数 24.85M，FLOPs 3.88GFLOPs）。
PyConvHGResNet-50 在单模型准确度方面更强（Top-1 21.52%）。
PyConv 通过多尺度核实现有效下采样，在不增加成本的情况下提升平移不变性。
带有 PyConvPH 的 PyConvSegNet 框架在 ADE20K 的场景解析上实现了有竞争力/较强的结果。
在不同深度上，PyConv 变体在训练中收敛更快，验证精度也优于 ResNet 对应模型。
结果表明，在各阶段增大核大小（如 9x9、7x7、5x5、3x3）并使用适当的分组，可在不增加参数的情况下实现稳定的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。