QUICK REVIEW

[论文解读] ResNet strikes back: An improved training procedure in timm

Ross Wightman, Hugo Touvron|arXiv (Cornell University)|Oct 1, 2021

Advanced Neural Network Applications参考文献 57被引用 33

一句话总结

本论文在 224x224 的标准分辨率下，对 vanilla ResNet-50 的训练进行再优化，使用现代训练要素（Mixup、CutMix、RandAugment、BCE 损失、正则化以及大批量 LAMB 优化器）以建立更强的基线并评估稳定性，报告 ImageNet-val 的 Top-1 为 80.4%。

ABSTRACT

The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4% top-1 accuracy at resolution 224x224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure.

研究动机与目标

展示如何在标准推理分辨率 (224x224) 下，利用现代训练要素最大化 vanilla ResNet-50 的性能。
在 timm 中提供强基线的训练过程和预训练模型，便于对比不同架构。
通过对不同随机种子和数据集的训练过程稳定性分析，评估测量噪声和过拟合风险。
展示经过优化的配方在七个下游数据集上的迁移学习性能和泛化能力，以及对不同架构的适用性。

提出的方法

三个 ResNet-50 训练过程（A1: 600 轮，A2: 300 轮，A3: 100 轮）并对超参数与要素进行调优。
采用多标签 BCE 损失、Mixup 与 CutMix，以反映混合概念的存在。
使用 RandAugment 变体、Mixup、CutMix、Repeat Augmentation，以及带有按计划使用的随机深度正则化。
默认使用大批量优化（LAMB）并配以余弦学习率调度；与附录 B 中的 CE/BCE 消融和替代优化器进行比较。
通过多种种子运行并在 ImageNet-val、ImageNet-V2、ImageNet-Real 上测量性能来评估训练稳定性。
在七个下游数据集上使用提出的预训练配方报告迁移学习性能。

实验结果

研究问题

RQ1在 224x224 的 ImageNet-1k 验证集上，timm 中哪种 ResNet-50 训练程序能够实现最高的 val 准确率？
RQ2现代训练要素（增强、正则化和损失选择）如何与批量大小和轮数相互作用，以影响 vanilla ResNet-50 的性能？
RQ3在优化程序下，ImageNet 的准确性结果在随机种子与相关测试集（val、V2、Real）之间有多稳定？
RQ4提出的训练程序是否能对更大型的架构和下游任务带来迁移收益？
RQ5当同一训练配方在不同模型上复用时，架构与程序的对比关系是否会改变？

主要发现

A1 程序（600 轮）在 224x224 的 ImageNet-val 上达到 80.4% Top-1 准确率，超越了先前的 vanilla ResNet-50 基线。
正则化与增强的选择（包括 Mixup、CutMix、RandAugment 和 Repeat Augmentation）结合 BCE 损失与 LAMB 优化器，在大批量（2048）下仍能提供强劲性能。
A2（300 轮）实现 79.8% Top-1 验证准确率，展示了在现代训练中较长训练计划的竞争力；A3（100 轮）实现 78.1% Top-1，体现了在计划之间的成本/收益权衡。
种子方差分析显示 ImageNet-val 的标准差约为 0.1，在 ImageNet-V2 上方差更高，强调了测量噪声及报告多个测试集的价值。
将 A1/A2 配方迁移到其他架构在多种模型上优于文献基线；A1 通常在多数任务上提供最佳的下游迁移。
本文表明在比较架构时需要匹配训练过程，因为同一过程在不同模型上可能产生不同的相对排名。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。