QUICK REVIEW

[论文解读] Vision Transformer Pruning

Mingjian Zhu, Yehui Tang|arXiv (Cornell University)|Apr 17, 2021

Advanced Neural Network Applications参考文献 44被引用 53

一句话总结

本文提出 Vision Transformer Pruning (VTP)，一种通过学习逐维重要性分数并使用 L1 稀疏性来剪枝 ViT 模型中的 MHSA 和 MLP 投影的方法，在实现显著的参数与 FLOPs 降低的同时，几乎不损失准确率。

ABSTRACT

Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a vision transformer pruning approach, which identifies the impacts of dimensions in each layer of transformer and then executes pruning accordingly. By encouraging dimension-wise sparsity in the transformer, important dimensions automatically emerge. A great number of dimensions with small importance scores can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for vision transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning dimensions of linear projections; 3) fine-tuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate the effectiveness of our proposed method.

研究动机与目标

通过减少存储、内存和计算需求，激发并使视觉 Transformer 在边缘设备上的实际部署成为可能。
提出一个有原则性的剪枝框架，识别并移除 transformer 投影中的不重要特征维度。
证明诱导稀疏性的训练能够自动产生显著的关键维度，并实现大幅压缩且准确率损失有限。
在 ImageNet-1K 和 ImageNet-100 上提供实证验证，证明有效的剪枝和加速。

提出的方法

引入用于 MHSA 和 MLP 模块的线性投影维度的可学习重要性分数。
用实数值的重要性分数放宽离散剪枝决策，并使用 L1 惩罚项强制实现稀疏性。
通过稀疏性正则化进行训练以获得接近零的重要性分数，然后阈值化得到二值剪枝掩码。
在所有 MHSA 和 MLP 组件上应用剪枝、重新连线被剪枝的投影，并对剪枝后的模型进行微调。
在 ImageNet-1K 和 ImageNet-100 上以参数量、FLOPs 和准确率来评价压缩效果。

实验结果

研究问题

RQ1通过可学习的重要性分数进行逐维剪枝，是否能够在视觉 Transformer 中显著减少参数量和 FLOPs，同時不产生较大的准确率损失？
RQ2稀疏性正则化训练机制如何影响 ViT 中重要维度与可剪枝维度的出现？
RQ3在标准视觉基准测试中，剪枝率、模型尺寸、计算成本和准确率之间有哪些权衡？
RQ4所提出的 VTP 方法在像 ImageNet-1K 这样的大规模数据集以及像 ImageNet-100 这样的较小子集上是否都有效？

主要发现

该方法在 ImageNet 基准上实现了参数量和 FLOPs 的显著减少，同时准确率损失较小。
剪裁多达 40% 的维度即可在实现显著 FLOPs 降低的同时保留大部分基线准确率。
剪枝性能随稀疏度水平增加而提升，并在 Imagenet-100 与 Imagenet-1K 上保持一致。
该方法为剪枝视觉 Transformer 提供了一个简单的基线，并展示了其实用部署潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。