QUICK REVIEW

[论文解读] Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks

Yunpeng Chen, Xiaojie Jin|arXiv (Cornell University)|Mar 7, 2017

Tensor decomposition and applications参考文献 28被引用 29

一句话总结

本文提出集体残差单元（CRUs），一种新颖的深度学习架构，通过广义块秩分解统一残差函数，并利用集体张量分解实现单元间知识共享，从而在参数效率方面提升残差网络性能。CRU在ImageNet-1k和Places365-Standard上实现了最先进（SOTA）的准确率，模型规模与ResNet-50相当，优于参数更多的ResNet-200，同时显著减少了参数数量。

ABSTRACT

Residual units are wildly used for alleviating optimization difficulties when building deep neural networks. However, the performance gain does not well compensate the model size increase, indicating low parameter efficiency in these residual units. In this work, we first revisit the residual function in several variations of residual units and demonstrate that these residual functions can actually be explained with a unified framework based on generalized block term decomposition. Then, based on the new explanation, we propose a new architecture, Collective Residual Unit (CRU), which enhances the parameter efficiency of deep neural networks through collective tensor factorization. CRU enables knowledge sharing across different residual units using shared factors. Experimental results show that our proposed CRU Network demonstrates outstanding parameter efficiency, achieving comparable classification performance to ResNet-200 with the model size of ResNet-50. By building a deeper network using CRU, we can achieve state-of-the-art single model classification accuracy on ImageNet-1k and Places365-Standard benchmark datasets. (Code and trained models are available on GitHub)

研究动机与目标

解决标准残差单元在深层神经网络中尽管性能提升显著但参数效率较低的问题。
基于张量分解，将多种残差函数设计（如ResNet、Wide ResNet、ResNeXt）统一于同一数学框架下。
开发一种新架构，实现残差单元间的知识共享，提升参数效率，同时不损失性能。
在大规模基准测试中，使用显著更小的模型实现最先进（SOTA）的分类准确率。

提出的方法

基于广义块秩分解（GBT）提出统一框架，将多种残差函数表示为低秩Tucker算子之和。
引入集体残差单元（CRU），通过在多个残差单元间共享因子矩阵，实现知识迁移并减少参数数量。
采用集体张量分解对残差单元间的卷积核进行分解，实现共享表征的同时保持模块化结构。
将Tucker分解作为块秩分解的特例，用于以低秩分量近似高阶卷积核。
设计模块化架构，使残差单元在层间共享因子矩阵，降低冗余性并提升参数效率。
使用标准优化技术端到端训练CRU-Net架构，并通过消融实验研究秩和因子共享的影响。

实验结果

研究问题

RQ1能否在单一张量分解框架下统一深层残差网络中多样的残差函数设计？
RQ2如何利用集体张量分解在残差单元间实现知识共享并提升参数效率？
RQ3在不损失模型准确率的前提下，参数效率最多可提升多少？
RQ4统一的、因子共享的架构能否在大规模图像分类基准上实现最先进（SOTA）性能？

主要发现

CRU-Net在ImageNet-1k上的top-1错误率为20.6%，模型大小为168 MB，性能与ResNet-200（247 MB）相当，但参数量减少32%。
CRU-Net-116在ImageNet-1k上达到20.3%的top-1错误率，优于ResNeXt-101（64x4d）和WRN，且模型大小仅为318 MB。
在Places365-Standard上，CRU-Net-116实现56.60%的top-1准确率，超过ResNet-152（54.74%），且模型大小为163 MB，小于ResNet-152的226 MB。
136x1d设置的模型在ImageNet-1k上实现22.1%的top-1错误率，表明在不同分解秩下性能保持一致。
实验表明，将模型规模扩大至超过CRU-Net-116时出现过拟合，表明该架构在ImageNet-1k上的容量已足够。
所提出的CRU架构表明，集体张量分解可有效实现残差单元间知识共享，显著提升参数效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。