QUICK REVIEW

[论文解读] RefConv: Re-parameterized Refocusing Convolution for Powerful ConvNets

Zhicheng Cai, Xiaohan Ding|arXiv (Cornell University)|Oct 16, 2023

Advanced Neural Network Applications被引用 16

一句话总结

RefConv 使用可重新参数化的再聚焦机制替代常规卷积层，该机制将来自预训练模型的卷积核参数连接起来，在不增加推理成本的情况下提升准确性。训练阶段的变换会生成变换后的权重，这些权重用于推理且不改变模型结构。

ABSTRACT

We propose Re-parameterized Refocusing Convolution (RefConv) as a replacement for regular convolutional layers, which is a plug-and-play module to improve the performance without any inference costs. Specifically, given a pre-trained model, RefConv applies a trainable Refocusing Transformation to the basis kernels inherited from the pre-trained model to establish connections among the parameters. For example, a depth-wise RefConv can relate the parameters of a specific channel of convolution kernel to the parameters of the other kernel, i.e., make them refocus on the other parts of the model they have never attended to, rather than focus on the input features only. From another perspective, RefConv augments the priors of existing model structures by utilizing the representations encoded in the pre-trained parameters as the priors and refocusing on them to learn novel representations, thus further enhancing the representational capacity of the pre-trained model. Experimental results validated that RefConv can improve multiple CNN-based models by a clear margin on image classification (up to 1.47% higher top-1 accuracy on ImageNet), object detection and semantic segmentation without introducing any extra inference costs or altering the original model structure. Further studies demonstrated that RefConv can reduce the redundancy of channels and smooth the loss landscape, which explains its effectiveness.

研究动机与目标

通过在核参数之间建立联系并应用再聚焦变换，增强现有 CNN 结构的先验。
在不改变推理时的网络架构或成本的前提下，提升预训练模型的表示能力。
在图像分类、目标检测和语义分割等任务上验证该方法的有效性。
分析 RefConv 如何影响通道冗余和损失景观以解释性能提升。

提出的方法

用 RefConv 替换常规卷积层，冻结来自预训练模型的基础权重 Wb，并学习一个再聚焦变换 T 以产生变换后的权重 Wt。
定义 Wt = T(Wb, Wr)，其中 Wr 是可训练的再聚焦参数，Wt 用于推理。
在深度卷积情况下使用密集再聚焦变换，其他卷积类型使用广义的分组版本，以建立跨通道连接。
添加一个恒等映射以学习基权重的增量，即 Wt = Wb * T(Wb, Wr) + Wb。
在冻结 Wb 的情况下进行 Refocusing 学习并训练 Wr，然后保存变换后的权重用于推理，保持推理图与基线相同。
将 RefConv 一般化为分组卷积和密集卷积，使用超参数 G 控制再聚焦变换的分组数量，在跨通道连接和参数效率之间取得平衡。
报告 RefConv 在训练时成本可忽略且推理成本为零，因为推理使用 Wt 且结构未改变。

实验结果

研究问题

RQ1通过增强现有核结构的先验，RefConv 是否能在不增加推理成本的情况下提升 CNN 的性能？
RQ2再聚焦变换如何影响预训练核的通道冗余和跨通道互动？
RQ3带有 RefConv 的模型是否在 ImageNet 分类以及下游任务如目标检测和语义分割上提升性能？
RQ4与标准再训练或微调相比，Refocusing 学习在训练动态和损失景观方面有何影响？

主要发现

RefConv 在多种骨干网络上取得明确的准确性提升（例如在 ImageNet 上 MobileNetv3-S 的 top-1 最高提升至 1.47%，并且 ShuffleNetv2 与 FasterNet-S 也有显著提升）。
将权重变换为推理权重量后，推理时的参数与 FLOPs 从基线保持不变。
RefConv 通过增大核通道之间的 KL 散度来降低通道冗余，指示了更具多样性的表示。
使用 RefConv 的训练平滑了损失景观，产生更宽的、稀疏的等高线，潜在地提高泛化能力。
消融研究表明预训练基权重 Wb 是重要的先验，零初始化 Wr 仍然可以提升性能，尽管标准随机初始化表现最好。
RefConv 的改进可迁移到目标检测（Pascal VOC SSD）和语义分割（Cityscapes DeepLabv3+），相较于基线提高了 mAP/mIoU。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。