QUICK REVIEW

[论文解读] WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Sidak Pal Singh, Dan Alistarh|arXiv (Cornell University)|Apr 29, 2020

Advanced Neural Network Applications参考文献 48被引用 36

一句话总结

WoodFisher 通过经验 Fisher 和 Woodbury 恒等式的高效逆-Hessian 近似来实现基于二阶的剪枝。它在 ImageNet 和 CIFAR10 的 CNNs 上实现了最先进的一次性剪枝并在渐进剪枝方面具有竞争力。

ABSTRACT

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies issues with existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is considered, our method results in a gain in test accuracy over the state-of-the-art approaches, for pruning popular neural networks (like ResNet-50, MobileNetV1) trained on standard image classification datasets such as ImageNet ILSVRC. We examine how our method can be extended to take into account first-order information, as well as illustrate its ability to automatically set layer-wise pruning thresholds and perform compression in the limited-data regime. The code is available at the following link, https://github.com/IST-DASLab/WoodFisher.

研究动机与目标

研究二阶信息是否可用于神经网络的准确性与可扩展性设定动机。
开发一个高效的方法来估计适用于大模型的逆 Hessian 信息。
将该方法应用于 Optimal Brain Damage/Surgeon 框架下的神经网络压缩。
证明在现有方法之上实现的一次性和渐进剪枝性能改进。

提出的方法

用经验 Fisher 近似 Hessian，并使用 Woodbury 恒等式迭代更新 inverse-Fisher 估计。
以递归方式更新经验 Fisher，hat_{n+1} = F_hat_n + 1/N grad(l_{n+1}) grad(l_{n+1})^T with a dampening term \u001lambda I_d。
通过 Woodberry 更新计算逆： F_hat_{n+1}^{-1} = F_hat_n^{-1} - (F_hat_n^{-1} grad(l_{n+1}) grad(l_{n+1})^T F_hat_n^{-1}) / (N + grad(l_{n+1})^T F_hat_n^{-1} grad(l_{n+1})).
引入分块（chunked）近似以扩展到大模型，使运行时间达到 O(m c d)，其中块大小为 c，总参数为 d。
定义剪枝统计 ϵ_q = w_q^2 / (2 [H^{-1}]_{qq}) 用于对参数进行移除排序，并据此执行层级剪枝或全局剪枝（joint vs independent WoodFisher）。
扩展到包括一阶（梯度）项并讨论有限数据情形下的剪枝和自动层级稀疏阈值。

实验结果

研究问题

RQ1二阶近似（通过逆-Hessian 信息）是否对现代神经网络既准确又可扩展？
RQ2经验 Fisher 是否是大规模剪枝任务中一个实用且可信的 Hessian 代理？
RQ3基于 WoodFisher 的剪枝能否在一次性和渐进剪枝设置中超越基于幅度的和对角 Fisher 的基线？
RQ4联合（全局）稀疏目标是否比分层剪枝在压缩性能上有提升？
RQ5能否将 WoodFisher 拓展到有限数据情景并结合一阶信息在完全收敛前进行剪枝？

主要发现

WoodFisher 在 ResNet-20/CIFAR-10 与 ResNet-50/ImageNet 的一次性剪枝上显著优于幅度剪枝和对角-Fisher 基线。
Joint WoodFisher（全局稀疏目标）通常优于独立（分层）WoodFisher，尤其在更高的稀疏度下。
使用分块的块状近似在保持剪枝质量的同时维持实际效率，且更大块大小能提升准确性。
WoodFisher 在渐进剪枝场景下超过了最先进的剪枝方法，在需要再训练的情况下有时也能达到或超过顶尖方法。
经验性证据表明，从 WoodFisher 构建的局部二次模型能较准确地预测沿剪枝方向的损失变化，支持近似质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。