QUICK REVIEW

[论文解读] Re-parameterizing Your Optimizers rather than Architectures

Xiaohan Ding, Honghao Chen|arXiv (Cornell University)|May 30, 2022

Advanced Neural Network Applications被引用 27

一句话总结

本文提出 RepOptimizers，在优化器中注入与模型相关的先验知识，使简单的 VGG 风格模型（RepOpt-VGG）达到或超过经过精心设计的网络的性能，同时训练更快，量化更易。

ABSTRACT

The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameters. Such a methodology is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. We hope to spark further research beyond the realms of model structure design. Code and models \url{https://github.com/DingXiaoH/RepOptimizers}.

研究动机与目标

推动在优化器中使用模型特定的先验知识，而不仅仅依赖架构设计。
提出梯度重参数化（GR）和 RepOptimizers，作为将先验编码到梯度更新中的一种方式。
展示 RepOpt-VGG 在精度上的竞争力以及相对于最先进模型的更高训练效率。
强调实际收益，包括训练速度、内存效率和对量化的友好性。

提出的方法

定义梯度重参数化（GR），用模型特定的超参数修改梯度。
引入实现 GR 的 RepOptimizers，无需额外的前向/后向计算或新参数。
使用 CSLA（Constant-Scale Linear Addition）模块在概念上将结构化的先验与梯度乘子（Grad Mult）相关联。
通过将 RepVGG 风格模块中的 BN 替换为可训练/非可训练的通道尺度并推导 Grad Mults，实例化 RepOpt-VGG。
使用 Hyper-Search 通过在搜索数据集上训练一个小型辅助模型来获得 Grad Mult 超参数。
在 ImageNet 上训练 RepOpt-VGG，并与 RepVGG 和 EfficientNets 进行比较，以评估准确性、训练速度、内存使用和量化行为。

实验结果

研究问题

RQ1是否能够有效地将模型特定的先验知识融入优化器，以改善非凸深度网络的训练动态？
RQ2在使用 RepOptimizer 训练时，普通的 VGG 风格模型与设计良好的架构相比的表现如何？
RQ3RepOptimizers 是否可跨数据集迁移（即数据集无关），对量化有何影响？

主要发现

RepOpt-VGG 在准确性上与多种设计良好的模型相当或超越，同时训练更快，内存高效。
RepOpt-VGG 的训练速度在可比的硬件上大约快于 RepVGG 1.8x，且准确度相近（表2）。
RepOpt-VGG 在简单架构和训练动力学下，仍对 EfficientNets 展现出有竞争力的 Top-1 精度（表3）。
消融研究显示，初始化和梯度修改对于基于 CSLA 的 RepOptimizers 至关重要，以保持与目标结构的等效性（表4）。
对 CIFAR-100 的 Hyper-Search 产生的 Grad Mults 能迁移到 ImageNet，支持 RepOptimizers 是模型特定但数据集无关的观点（表5、表6）。
在下游任务中，RepOpt-VGG 在 COCO 检测和 Cityscapes 分割方面与 RepVGG 表现相当（表7）。
RepOpt-VGG 的量化行为比结构重参数化模型更友好，在 INT8 PTQ 下仅 ~2.5% 的精度损失（表8）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。