QUICK REVIEW

[论文解读] VanillaNet: the Power of Minimalism in Deep Learning

Hanting Chen, Yunhe Wang|arXiv (Cornell University)|May 22, 2023

Advanced Neural Network Applications被引用 83

一句话总结

VanillaNet 显示了一个最小、浅层的卷积架构，在没有快捷连接或自注意力的情况下，通过深度训练策略和系列激活函数训练，可以在显著降低架构复杂性和延迟的情况下达到与最先进性能的水平。

ABSTRACT

At the heart of foundation models is the philosophy of "more is different", exemplified by the astonishing success in computer vision and natural language processing. However, the challenges of optimization and inherent complexity of transformer models call for a paradigm shift towards simplicity. In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design. By avoiding high depth, shortcuts, and intricate operations like self-attention, VanillaNet is refreshingly concise yet remarkably powerful. Each layer is carefully crafted to be compact and straightforward, with nonlinear activation functions pruned after training to restore the original architecture. VanillaNet overcomes the challenges of inherent complexity, making it ideal for resource-constrained environments. Its easy-to-understand and highly simplified architecture opens new possibilities for efficient deployment. Extensive experimentation demonstrates that VanillaNet delivers performance on par with renowned deep neural networks and vision transformers, showcasing the power of minimalism in deep learning. This visionary journey of VanillaNet has significant potential to redefine the landscape and challenge the status quo of foundation model, setting a new path for elegant and effective model design. Pre-trained models and codes are available at https://github.com/huawei-noah/VanillaNet and https://gitee.com/mindspore/models/tree/master/research/cv/vanillanet.

研究动机与目标

推动在资源受限环境中便于部署的极简 CNN 设计的转变。
提出 VanillaNet 架构，避免深度、快捷连接和自注意力，同时保持有竞争力的性能。
开发训练和激活技术，以弥补浅层网络的非线性受限。
在大规模图像分类和下游任务上评估 VanillaNet，以基准化效率-精度权衡。

提出的方法

提出 VanillaNet：干前置 stem，逐阶段单层结构，stem 为 4x4x3xC，步幅为 4 的卷积，后续阶段使用 1x1 卷积并通道翻倍（最后阶段除外）。
使用深度训练策略，通过在训练过程中用权重平均的恒等混合 A'(x)=(1-λ)A(x)+λx 逐步合并成对卷积，λ=e/E 随 epoch 变化。
提出系列激活函数 A_s(x) = sum_{i=-n}^{n} a_i A(x + b_i)（以及带邻居移位的变体），在不产生高代价的前提下提升非线性。
训练后将 BN 与相邻卷积合并，得到单一卷积以实现高效推理（对 1x1 卷积有特殊处理）。
实现基于系列的激活，以实现跨特征图的全局信息交换，并将其运行时成本与标准卷积进行比较（在实际情境中，O(SA) << O(CONV)）。
对系列项数 n、深度训练以及快捷连接的存在/位置进行消融分析（在 VanillaNet 中均未带来显著收益）。

实验结果

研究问题

RQ1浅层、全卷积网络在没有快捷连接或自注意力时，是否也能取得与 ImageNet 相竞争的准确度？
RQ2深度训练和系列激活技术是否能可靠提升极简 VanillaNet 变体的性能？
RQ3在极简架构中，移除快捷连接对性能和推理速度的影响如何？
RQ4在下游任务（如 COCO）中，VanillaNet 的表现是否与最先进的骨干网络相当？

主要发现

使用系列激活（n=3）的 VanillaNet 在 ImageNet 的 top-1 达到 76.36%，VanillaNet-6 的 top-1 与 overall 的 76.36% 由深度训练实现。
深度训练加系列激活显著提升了纯浅层网络的性能（如 AlexNet 的提升约 6%）；而对 ResNet-50 的提升边际，表明在本质上已经较深的非极简模型中收益递减。
快捷连接对 VanillaNet 的准确性几乎没有帮助；在这种极简架构中，甚至可能略微降低非线性驱动的性能。
VanillaNet-9 在 Nvidia A100 上的延迟为 2.91 ms（批量大小 1）达到 79.87% 的 top-1；VanillaNet-13-1.5× 的延迟为 7.83 ms，达到 83.11% 的 top-1，体现了浅层极简网络的强速-精度权衡。
在 ImageNet 上，VanillaNet-9-13-1.5× 展现出具有竞争力的准确度（实际准确度约 83.1%），相比 ResNet-50 与 ConvNext 变体在深度和延迟配置上存在显著差异。
在 COCO 上，VanillaNet-13 提供具有竞争力的 AP 指标和比某些骨干 Swin/ConvNext 变体更高的 FPS，尽管 FLOPs/参数更高，表明在实时场景下的效率潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。