QUICK REVIEW

[论文解读] Identity Mappings in Deep Residual Networks

Kaiming He, Xiangyu Zhang|arXiv (Cornell University)|Mar 16, 2016

Advanced Neural Network Applications参考文献 22被引用 210

一句话总结

本文分析身份跳跃连接和身份在相加后激活如何在非常深的 ResNets 中实现直接前向和反向信息传播，提出了一个预激活残差单元，并在 CIFAR 和 ImageNet 上展示了极深网络的更好训练和泛化。

ABSTRACT

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/resnet-1k-layers

研究动机与目标

Motivate and analyze how skip connections facilitate information propagation in very deep residual networks.
Investigate the impact of different shortcut types and activation placements on optimization and generalization.
Propose a new residual unit with identity after addition (pre-activation) that eases optimization and improves performance.
Demonstrate state-of-the-art or competitive results with ultra-deep networks on CIFAR-10/100 and ImageNet.
Provide practical guidelines for designing deep ResNets to balance optimization ease and model capacity.

提出的方法

Derive forward and backward propagation properties under two identity conditions: identity skip connection h(x)=x and identity after-addition activation f(y)=y.
Analyze the effect of non-identity shortcuts (scaling, gating, 1x1 conv, dropout) using theoretical expressions and ablation experiments.
Introduce a pre-activation residual unit where activation functions are moved before weight layers, effectively making the after-addition activation an identity.
Experimentally compare variants on CIFAR-10/100 with ResNet-110/164/1001 architectures and on ImageNet with ResNet-152/200 variants.
Provide training and architectural guidelines, including the impact of BN/ReLU placement relative to addition (pre-activation vs post-activation).
Report performance metrics and training dynamics to support the proposed design choices.

实验结果

研究问题

RQ1How do identity skip connections and identity after-addition activation influence forward signal propagation in deep ResNets?
RQ2What is the impact of non-identity shortcut components (scaling, gating, 1x1 convolutions, dropout) on optimization and generalization?
RQ3Can a pre-activation residual unit enable training of ultra-deep networks and improve generalization compared to the original residual unit?
RQ4How do activation placement and BN timing (pre- vs post-activation) affect performance on CIFAR-10/100 and ImageNet?
RQ5What practical guidelines emerge for constructing very deep ResNets that are easier to train and offer better accuracy?

主要发现

Identity skip connections and identity after-addition activation substantially ease optimization by allowing direct signal propagation across layers.
Non-identity shortcut components generally impede information flow and worsen training dynamics or final performance.
A pre-activation residual unit (BN and ReLU applied before weight layers) enables training of extremely deep networks (e.g., 1001 layers) with improved generalization on CIFAR-10/100 and competitive results on ImageNet.
On CIFAR-10, 1001-layer ResNet achieved 4.62% test error (best result with pre-activation variant); on CIFAR-10/100, pre-activation models consistently outperformed their baseline counterparts (e.g., ResNet-1001 CIFAR-10: 4.92% baseline vs 4.89% ±0.14, CIFAR-100: 22.71% vs 22.68% ±0.22).
On ImageNet, pre-activation ResNets showed improvements over the original for comparable architectures: ResNet-152 top-1 21.1% vs 21.3% with standard design, and ResNet-200 with pre-activation achieved 20.7% top-1 (320x320 testing) versus 21.8% for the original. When augmented, pre-activation ResNet-200 reached 20.1% top-1 (with scale+aspect augmentation) and 4.8% top-5.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。