QUICK REVIEW

[论文解读] Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi, Abhisek Kundu|arXiv (Cornell University)|May 2, 2017

Advanced Neural Network Applications参考文献 11被引用 61

一句话总结

FGQ 将预训练的全精度模型转换为带有 8/4 位激活的三值权重，而无需重新训练，使用权重组来平衡精度与计算减少；在 ImageNet 上实现接近 FP32 的精度，并带来显著的加速。

ABSTRACT

We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using FGQ. Our method involves ternarizing the original weight tensor in groups of $N$ weights. Using $N=4$, we achieve Top-1 accuracy within $3.7\%$ and $4.2\%$ of the baseline full precision result for Resnet-101 and Resnet-50 respectively, while eliminating $75\%$ of all multiplications. These results enable a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset, with a potential of $9 imes$ improvement in performance. Also, for smaller networks like AlexNet, FGQ achieves state-of-the-art results. We further study the impact of group size on both performance and accuracy. With a group size of $N=64$, we eliminate $\approx99\%$ of the multiplications; however, this introduces a noticeable drop in accuracy, which necessitates fine tuning the parameters at lower precision. We address this by fine-tuning Resnet-50 with 8-bit activations and ternary weights at $N=64$, improving the Top-1 accuracy to within $4\%$ of the full precision result with $<30\%$ additional training overhead. Our final quantized model can run on a full 8-bit compute pipeline using 2-bit weights and has the potential of up to $15 imes$ improvement in performance compared to baseline full-precision models.

研究动机与目标

在极低精度权重与激活下进行推理，几乎达到最先进水平，且无需或仅需极少再训练。
引入一种细粒度量化（FGQ）方法，将预训练权重量化为组以保持信息。
显示 FGQ 在 ImageNet 上通过 ResNet-101/ResNet-50 和 AlexNet 使用 2w-8a 与 2w-4a 可达到高 Top-1 精度。
分析组大小对准确性和计算节省的影响，并讨论 8 位计算流水线的硬件影响。

提出的方法

将全精度权重张量在大小为 N 的非重叠分组中三值化，给每个组产生独立的子问题。
对每个组，求解 alpha 和一个三值权重向量，使最小化 ||W^(i) - alpha * W^^(i)||_F^2（Eq. 2）。
使用针对正/负权重的单一 alpha，分别设定 Delta_p 与 Delta_n，通过闭式解或穷举法求得 alpha*, Delta_p*, Delta_n*（Eqs. 3–5）。
沿输入通道维度采用静态分组策略，以最小化组内动态范围并实现高效的内存布局与向量化（Fig. 2）。
将激活量化为 8/4 位，在计算中应用 32 位累加器以防溢出；推理期间重新计算批量归一化统计以补偿方差变化。
尝试不同的组大小（N）以权衡准确性与三元累加百分比（例如 N=4 时产生 75% 的三元 FPAs，N=64 时约 99%）

实验结果

研究问题

RQ1是否可以在不重新训练的情况下，将预训练的全精度网络转换为三值权重，并且损失极小？
RQ2在 2w-8a/2w-4a 推理流水线中，细粒度分组（N）如何影响准确性与计算节省？
RQ3哪种分组策略最有助于在跨层保持权重分布以最大化准确性？
RQ4FGQ 是否能够在 ImageNet 上通过 ResNet-101、ResNet-50 和 AlexNet 实现接近最先进水平的准确性且无需重新训练？
RQ5对全 8 位计算流水线而言，FGQ 的实际硬件含义与性能提升有哪些？

主要发现

FGQ 采用 N=4（FGQ-N4）在 ImageNet 上对 ResNet-101 的 Top-1 精度为 73.85%（2w-8a），为 70.69%（2w-4a），且不需重新训练。
将 FGQ-N4 应用于 ResNet-50 可得到 70.76% Top-1（2w-8a），68.38%（2w-4a），接近全精度结果。
将 FGQ-N4 应用到 AlexNet 可得到 49.04% Top-1（2w-8a，未进行再训练），比基线 56.83% 低约 8%。
更大的组大小（如 N=64）可消除约 99% 的乘法运算，但会带来明显的准确性损失，可通过有限的低精度再训练来缓解。
该方法实现一个完整的 8 位计算流水线，权重为 2 位，理论性能相对全精度基线提升高达 15 倍。
与密切相关的工作相比，FGQ 在许多配置下无需低精度训练即可实现竞争力或更高的准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。