QUICK REVIEW

[论文解读] ShaResNet: reducing residual network parameter number by sharing weights

Alexandre Boulch|arXiv (Cornell University)|Feb 28, 2017

Advanced Neural Network Applications参考文献 10被引用 19

一句话总结

本文提出 ShaResNet，一种残差网络变体，通过在相同空间尺度的残差块之间共享 3x3 卷积权重，减少了参数量。通过在多个块之间重用共享卷积，同时保留块特定的层，ShaResNet 实现了高达 39% 的参数减少——例如，将 152 层的 ResNet 减少到 106 层——同时在 ImageNet 上的 top-1 准确率损失小于 0.2%。

ABSTRACT

Deep Residual Networks have reached the state of the art in many image processing tasks such image classification. However, the cost for a gain in accuracy in terms of depth and memory is prohibitive as it requires a higher number of residual blocks, up to double the initial value. To tackle this problem, we propose in this paper a way to reduce the redundant information of the networks. We share the weights of convolutional layers between residual blocks operating at the same spatial scale. The signal flows multiple times in the same convolutional layer. The resulting architecture, called ShaResNet, contains block specific layers and shared layers. These ShaResNet are trained exactly in the same fashion as the commonly used residual networks. We show, on the one hand, that they are almost as efficient as their sequential counterparts while involving less parameters, and on the other hand that they are more efficient than a residual network with the same number of parameters. For example, a 152-layer-deep residual network can be reduced to 106 convolutional layers, i.e. a parameter gain of 39\%, while loosing less than 0.2\% accuracy on ImageNet.

研究动机与目标

在不牺牲准确率的前提下，减少深度残差网络中的参数数量。
探究是否可以共享残差块之间冗余的空间操作以提高参数效率。
评估在相同参数预算下，共享卷积层是否能优于更深但参数效率更低的顺序网络。
探讨在人工神经网络中采用类似循环的权重共享是否具有生物学上的合理性。
开发一种训练兼容的架构，同时保持标准残差网络的优化优势。

提出的方法

在残差网络中同一阶段（空间尺度）的所有残差块之间共享 3x3 卷积滤波器。
为身份映射和特征变换保留块特定的卷积层。
使用同一组共享卷积层服务于多个残差块，使信号可多次通过该层。
使用标准反向传播和随机梯度下降训练整个架构，与标准 ResNets 相同。
仅对主卷积层（不包括批归一化或全连接层）应用权重共享，以保持训练稳定性。
设计架构使得空间维度缩减仅发生在专用块中（图中以红色标记），而共享卷积（绿色）则在多个块之间跨域运行。

实验结果

研究问题

RQ1在不造成显著准确率损失的前提下，通过在残差块之间共享卷积权重是否可以减少参数量？
RQ2在相同参数数量下，ShaResNet 与标准 ResNets 的参数效率如何比较？
RQ3与更深但参数效率更低的架构相比，权重共享是否能提升深层网络的准确率-参数比？
RQ4权重共享带来的性能增益是否依赖于网络深度或数据集规模？
RQ5共享卷积是否能有效建模如生物大脑结构所暗示的块间冗余空间操作？

主要发现

在 152 层 ResNet 上，ShaResNet 将参数数量减少了 39%（从 106M 减少到 64M），在 ImageNet 上的 top-1 准确率损失小于 0.2%。
在 CIFAR-10 上，采用共享卷积的 164 层 ResNet（0.93M 参数）达到 93.8% 的准确率，与参数量稍多（0.96M）的浅层 92 层 ResNet（93.9% 准确率）相当。
在 CIFAR-10 上，Wide ResNet-28-4 的共享版本（5.85M 参数）达到 94.9% 准确率，略优于非共享版本（5.85M 参数）的 95.0%。
在 CIFAR-100 上，共享的 WRN-28-10（26.86M 参数）达到 79.8% 准确率，优于非共享的 WRN-22-10（26.85M 参数）的 79.55%。
在相同参数数量下，ShaResNet 的 top-1 错误率始终低于标准 ResNets，尤其在 ImageNet 等大规模数据集上表现更优。
权重共享带来的性能增益在更深的网络中最为显著，因为共享层能更有效地利用参数，这得益于空间操作中更高的冗余性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。