QUICK REVIEW

[论文解读] Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size

Jongsoo Park, Sheng R. Li|arXiv (Cornell University)|Jan 1, 2016

Advanced Neural Network Applications被引用 28

一句话总结

该论文提出 Holistic SparseCNN，一种通过支持任意稀疏模式的高效稀疏卷积操作，同时提升模型精度、推理速度和模型大小的优化方法。该方法在多种架构的 AlexNet 上实现了相对于密集卷积 3.1–7.3× 的加速，得益于一种新颖的稀疏-密集矩阵乘法核以及用于确定最优稀疏度水平的预测性性能模型。

ABSTRACT

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN's size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus limiting the achievable sparsity levels. We present a method to realize simultaneously size economy and speed improvement while pruning CNNs. Paramount to our success is an efficient general sparse-with-dense matrix multiplication implementation that is applicable to convolution of feature maps with kernels of arbitrary sparsity patterns. Complementing this, we developed a performance model that predicts sweet spots of sparsity levels for different layers and on different computer architectures. Together, these two allow us to demonstrate 3.1--7.3$ imes$ convolution speedups over dense convolution in AlexNet, on Intel Atom, Xeon, and Xeon Phi processors, spanning the spectrum from mobile devices to supercomputers. We also open source our project at this https URL.

研究动机与目标

解决现有 CNN 剪枝方法在减少模型大小的同时无法提升推理速度的局限性。
通过在卷积核和特征图中支持任意稀疏模式，实现在卷积推理中的显著加速。
开发一种性能模型，用于预测不同层和硬件架构下的最优稀疏度水平。
弥合模型压缩与实际推理加速之间的差距，尤其是在真实系统中。
提供一种实用且开源的解决方案，实现在移动设备、服务器和超级计算机平台上的高效稀疏 CNN 部署。

提出的方法

设计一种高效的通用稀疏-密集矩阵乘法核，支持卷积中任意稀疏模式。
集成一种性能模型，基于硬件特性与各层特有的计算模式，预测每层的最优稀疏度水平。
对 CNN 应用结构化与非结构化剪枝，同时保持精度并实现高速推理。
利用性能模型指导剪枝决策，确保稀疏度水平在目标硬件上实现最大加速。
在英特尔平台（包括 Atom、Xeon 和 Xeon Phi）上实现该方法，以验证跨架构性能。
开源实现代码，以支持可复现性及在多样化系统中的实际部署。

实验结果

研究问题

RQ1是否可以高效利用 CNN 卷积核中的任意稀疏模式，在不牺牲模型精度的前提下加速推理？
RQ2不同层和硬件平台下，何种稀疏度水平可实现最大加速？
RQ3性能模型是否能够准确预测最优稀疏度水平，以在速度、大小和精度之间取得平衡？
RQ4稀疏卷积操作在移动设备、服务器和超级计算系统中能在多大程度上实现加速？
RQ5稀疏计算与性能建模的结合如何实现 CNN 的整体优化？

主要发现

Holistic SparseCNN 在英特尔 Atom、Xeon 和 Xeon Phi 处理器上的 AlexNet 中，相对于密集卷积实现了 3.1–7.3× 的加速。
该方法通过高效的稀疏卷积操作，实现了模型大小、推理速度和精度的同步提升。
性能模型成功识别出不同层和硬件平台下的最优稀疏度水平，从而实现最大加速。
通过一种新颖的稀疏-密集矩阵乘法核，高效支持任意稀疏模式，实现了高度灵活性与高性能。
开源实现使该方法可广泛部署于从移动设备到超级计算机的各类系统。
仅剪枝全连接层不足以实现加速，但当结合性能模型对稀疏卷积进行优化时，可实现显著性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。