QUICK REVIEW

[论文解读] Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Kirill Neklyudov, Dmitry Molchanov|arXiv (Cornell University)|May 20, 2017

Bayesian Methods and Mixture Models被引用 68

一句话总结

引入结构化贝叶斯裁剪（SBP），一种类似 dropout 的贝叶斯层，通过对层输出施加对数正态乘性噪声并基于 SNR 进行剪枝来诱导结构化稀疏，从而在不显著损失准确性的情况下加速 CNNs 和 FC 网络。

ABSTRACT

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.

研究动机与目标

开发一种贝叶斯正则化框架，在神经网络中产生结构化稀疏。
实现整块神经元或卷积通道的移除以加速推断。
提供一个可行的变分推断方法，并为乘性噪声设定一个适当的先验。
在 LeNet 和类 VGG 的架构上，结合 MNIST 和 CIFAR-10 展示实际加速性能。

提出的方法

引入一种类似 dropout 的 SBP 层，它用噪声变量 theta 乘以神经元输出。
在 theta 上放置一个诱导稀疏性的对数均匀先验，并用截断的对数正态分布来近似其后验。
通过截断以确保一个妥善的概率模型，推导 q(theta|mu, sigma) 与 p(theta) 之间的闭式 KL 散度。
使用带重参数化的随机变分推断来训练(mu, sigma) 和网络权重。
计算测试时的期望 E[theta]，以在没有贝叶斯集成的情况下进行一次前向传播。
基于 theta 的信噪比 (SNR) 进行阈值剪枝，以移除低 SNR 的组别（神经元/滤波器）。
通过在组之间共享 theta，将 SBP 扩展为在多维张量（如 CNN 的通道）上产生结构化稀疏。

实验结果

研究问题

RQ1如何将贝叶斯 dropout 调整为在神经网络中产生结构化稀疏模式？
RQ2在使用不当的对数均匀先验时，是否可以导出一个可处理的变分目标，截断又如何影响训练？
RQ3SBP 是否通过移除整块神经元或通道，在标准架构和数据集上实现实用的加速而保持最小的准确度损失？
RQ4与固定均值相比，同时训练乘性噪声的均值和方差如何影响稀疏性和性能？

主要发现

SBP 实现了高水平的组稀疏性，使 CNN 和全连接网络在几乎不损失准确度的情况下获得加速。
对对数正态噪声同时训练 mu 和 sigma 相较于固定均值，带来更紧的变分下界和更高的稀疏性。
基于低 SNR 的 theta 分量进行剪枝，可以有效移除整块神经元/滤波器，通常不损失精度。
将 SBP 应用于 LeNet 和类 VGG 的网络在 MNIST 和 CIFAR-10 上，展示了在 CPU、GPU 和 FLOPs 上的实际加速，且精度具有竞争力。
截断的对数正态–对数均匀先验-后验对提供了一个定义良好、可处理的证据下界（ELBO），并避免了不当先验带来的问题。
SBP 层可以作为轻量级、类似 dropout 的模块插入，且软件修改极少。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。