QUICK REVIEW

[论文解读] NViT: Vision Transformer Compression and Parameter Redistribution.

Huanrui Yang, Hongxu Yin|arXiv (Cornell University)|Oct 10, 2021

Advanced Neural Network Applications参考文献 36被引用 27

一句话总结

本文提出NViT，一种源自Vision Transformer（ViT）模型全局、延迟感知结构剪枝的新型视觉Transformer架构。通过分析剪枝后的权重结构，作者更高效地重新分配参数，在ImageNet-1K上实现2.6倍FLOPs减少、5.1倍参数减少以及1.9倍推理速度提升，同时仅造成0.07%的精度下降，且在更低延迟下相比人工设计的DEIT变体精度提升0.1–1.1%。

ABSTRACT

Transformers yield state-of-the-art results across many tasks. However, they still impose huge computational costs during inference. We apply global, structural pruning with latency-aware regularization on all parameters of the Vision Transformer (ViT) model for latency reduction. Furthermore, we analyze the pruned architectures and find interesting regularities in the final weight structure. Our discovered insights lead to a new architecture called NViT (Novel ViT), with a redistribution of where parameters are used. This architecture utilizes parameters more efficiently and enables control of the latency-accuracy trade-off. On ImageNet-1K, we prune the DEIT-Base (Touvron et al., 2021) model to a 2.6x FLOPs reduction, 5.1x parameter reduction, and 1.9x run-time speedup with only 0.07% loss in accuracy. We achieve more than 1% accuracy gain when compressing the base model to the throughput of the Small/Tiny variants. NViT gains 0.1-1.1% accuracy over the hand-designed DEIT family when trained from scratch, while being faster.

研究动机与目标

降低视觉Transformer在推理阶段的高计算成本。
发现剪枝后的ViT模型中隐含的结构规律，以指导架构重构。
开发一种新型ViT架构NViT，通过更高效的参数重分配实现更优的延迟-精度权衡。
在极小精度损失下实现显著的模型压缩，同时提升推理速度。

提出的方法

对所有ViT参数应用全局、结构化剪枝，并引入延迟感知正则化以降低计算成本。
分析剪枝后ViT模型的权重结构，识别重复出现的模式与规律。
基于剪枝架构的洞察，设计新型ViT架构NViT，实现参数使用方式的重构。
在NViT中重新分配参数，以提升效率并实现可调的延迟-精度权衡。
从零开始训练NViT，并在相同压缩条件下与人工设计的DEIT变体进行对比。

实验结果

研究问题

RQ1在全局剪枝的视觉Transformer的权重矩阵中，会涌现出何种结构规律？
RQ2从剪枝后的ViT架构中获得的洞见，如何用于设计更高效的Transformer模型？
RQ3与人工设计的变体相比，ViT中重新设计的参数分布能否带来更优的延迟-精度权衡？
RQ4在不造成显著精度损失的前提下，FLOPs与参数量最多可减少多少？

主要发现

通过全局、延迟感知的结构化剪枝对DEIT-Base进行剪枝，实现了2.6倍FLOPs减少，精度仅下降0.07%。
相同的剪枝过程使参数量减少5.1倍，并在ImageNet-1K上实现1.9倍的运行时速度提升。
当压缩至与DEIT-Small或DEIT-Tiny相当的吞吐量水平时，NViT在精度上相比DEIT系列提升了超过1%。
在从零开始训练的情况下，NViT相比人工设计的DEIT变体精度提升0.1–1.1%，同时推理速度更快。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。