QUICK REVIEW

[论文解读] All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation

Di Xie, Jiang Xiong|arXiv (Cornell University)|Mar 6, 2017

Advanced Neural Network Applications参考文献 17被引用 37

一句话总结

该论文提出了一种新型训练方法，用于极深的普通卷积神经网络（CNN），采用正交权重正则化和基于拟等距性的反向误差调制机制。通过在滤波器组之间强制实现正交性，并在反向传播过程中动态调制误差幅度，该方法实现了对44层和110层普通网络的训练，其在CIFAR-10和ImageNet上的性能达到或超过带有残差连接的残差网络，而无需使用跳跃连接。

ABSTRACT

Deep neural network is difficult to train and this predicament becomes worse as the depth increases. The essence of this problem exists in the magnitude of backpropagated errors that will result in gradient vanishing or exploding phenomenon. We show that a variant of regularizer which utilizes orthonormality among different filter banks can alleviate this problem. Moreover, we design a backward error modulation mechanism based on the quasi-isometry assumption between two consecutive parametric layers. Equipped with these two ingredients, we propose several novel optimization solutions that can be utilized for training a specific-structured (repetitively triple modules of Conv-BNReLU) extremely deep convolutional neural network (CNN) WITHOUT any shortcuts/ identity mappings from scratch. Experiments show that our proposed solutions can achieve distinct improvements for a 44-layer and a 110-layer plain networks on both the CIFAR-10 and ImageNet datasets. Moreover, we can successfully train plain CNNs to match the performance of the residual counterparts. Besides, we propose new principles for designing network structure from the insights evoked by orthonormality. Combined with residual structure, we achieve comparative performance on the ImageNet dataset.

研究动机与目标

解决在反向传播过程中因梯度消失/爆炸导致的极深普通CNN训练中的退化问题。
识别批量归一化和ReLU在深层网络中保持信号幅度方面的局限性。
开发一种直接的、非残差的解决方案，通过正交性和自适应误差调制从零开始训练超深层网络。
基于权重空间中的信号保持和等距性，提出深层网络架构的新设计原则。
证明正交正则化可优于标准L2权重衰减，并在性能上媲美残差网络。

提出的方法

引入一种正交正则化项，强制在每个卷积层内滤波器组之间实现正交性，以稳定反向误差传播。
提出一种基于连续参数化层之间拟等距性假设的反向误差调制机制，以控制误差幅度的缩放。
在由重复的Conv-BN-ReLU模块组成的标准普通CNN架构中应用该正则化和调制机制。
通过数学分析和实证验证表明，正交性可保持信号范数并缓解梯度消失问题。
采用逐层自适应缩放误差梯度，以抵消ReLU和批量归一化引起的信号衰减。
用正交正则化替代标准L2权重衰减，以提升深层网络中的优化稳定性。

实验结果

研究问题

RQ1仅靠正交正则化是否足以在无跳跃连接的情况下稳定极深普通CNN的训练？
RQ2滤波器组之间的正交性如何影响深层网络中的信号传播和梯度稳定性？
RQ3基于拟等距性的调制机制是否能有效控制深层网络中反向传播过程中的误差幅度？
RQ4所提出的方法是否能使普通网络在ImageNet和CIFAR-10上的性能达到或超过残差网络？
RQ5通过强制实施正交性和动态误差调制，可获得哪些关于网络架构和优化的洞见？

主要发现

在44层普通网络上，该方法在CIFAR-10上实现了88.42%的top-1准确率，优于标准SGD和其他优化器。
在110层普通网络上，该方法在CIFAR-10上达到了81.6%的top-1准确率，显著优于基线方法。
44层普通网络在ImageNet上使用正交正则化后，性能与34层残差网络相当。
在ImageNet上，该方法在101层普通网络上实现了70.0%的top-1准确率，表明其在CIFAR-10之外也具有良好的泛化能力。
可视化结果表明，正交正则化产生的特征图更具结构化，噪声更少，相比L2正则化。
实验结果证实，正交性有效减少了梯度消失，并加快了深层网络的收敛速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。