QUICK REVIEW

[论文解读] Embedded Binarized Neural Networks

Bradley McDanel, Surat Teerapittayanon|arXiv (Cornell University)|Sep 6, 2017

Advanced Neural Network Applications参考文献 9被引用 76

一句话总结

eBNN 将 BNN 推理重新排序以使用二进制临时变量，从而实现中间内存约减少32x，并在 RAM 为几十KB 的设备上实现设备端 DNN 推理；在 MNIST 和 CIFAR10-Easy 上使用 Curie/Intel Curie 硬件进行演示，并提供自动生成服务。

ABSTRACT

We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference. To accomplish this, eBNN reorders the computation of inference while preserving the original BNN structure, and uses just a single floating-point temporary for the entire neural network. All intermediate results from a layer are stored as binary values, as opposed to floating-points used in current BNN implementations, leading to a 32x reduction in required temporary space. We provide empirical evidence that our proposed eBNN approach allows efficient inference (10s of ms) on devices with severely limited memory (10s of KB). For example, eBNN achieves 95\% accuracy on the MNIST dataset running on an Intel Curie with only 15 KB of usable memory with an inference runtime of under 50 ms per sample. To ease the development of applications in embedded contexts, we make our source code available that allows users to train and discover eBNN models for a learning task at hand, which fit within the memory constraint of the target device.

研究动机与目标

在内存极度受限的嵌入式设备上推动运行深度神经网络。
通过重新排序推理以将临时变量存储为二进制值来降低内存占用。
在不改变计算顺序的前提下，保持 BNN 的准确性。
在约 15 KB SRAM 的设备上展示实际性能（毫秒量级的运行时间）。
提供工具链和云端服务以训练并生成 eBNN 实现。

提出的方法

在 BNN 内重新排序推理为融合的二进制块，以最小化中间浮点临时变量。
在每个融合块中使用单一浮点累加器，块间二进制输出。
将卷积和池化（或全连接）融合成 Binary Convolution-Pool 或 Binary FC 块。
将中间量量化为二进制值，以实现大约 32x 的临时变量内存减少。
在保持原有网络结构和准确性的前提下仅改变计算顺序。
提供一个自动化流水线，在 Python 中训练模型并生成用于部署的嵌入式 C 代码。

实验结果

研究问题

RQ1eBNN 是否能将临时变量的内存占用降低到足以在极端受限的嵌入式设备上容纳标准 BNN？
RQ2重新排序推理是否在实现设备端实际运行时的同时保持准确性？
RQ3在 MNIST 和 CIFAR10-Easy 上的不同 eBNN 架构下，内存、运行时与准确性之间有哪些权衡？
RQ4随着网络深度和块融合策略的变化，eBNN 的性能如何扩展？
RQ5是否存在可访问的工具/服务模型，能够自动学习并在目标设备上部署 eBNN 模型？

主要发现

模型	准确率	时间（ms）	内存（KB）	能量（mWs）
MNIST	MLP-1	91.54	17.35	14.73	5.37
MNIST	MLP-2	84.65	9.17	13.53	4.95
MNIST	Conv-1	94.56	53.72	11.48	19.96
MNIST	Conv-2	96.49	193.02	13.77	63.02
MNIST	ConvPool-1	97.44	739.34	12.79	213.63
MNIST	ConvPool-2	97.86	886.53	13.07	243.98
MNIST	Conv-1-LE-I	91.95	23.91	5.99	6.02
MNIST	Conv-1-LE-II	90.74	16.47	4.63	4.15
CIFAR10-Easy	MLP-1	52.30	21.29	13.84	4.37
CIFAR10-Easy	MLP-2	41.80	19.65	14.00	2.31
CIFAR10-Easy	Conv-1	74.20	79.21	12.72	13.54
CIFAR10-Easy	Conv-2	79.80	250.08	14.30	48.64
CIFAR10-Easy	ConvPool-1	84.30	847.72	12.84	186.31
CIFAR10-Easy	ConvPool-2	77.20	968.18	13.47	223.41
CIFAR10-Easy	Conv-1-LE-I	?	?	?	?
CIFAR10-Easy	Conv-1-LE-II	?	?	?	?

BNN 的临时变量占用主导内存；通过将中间结果以二进制形式存储，eBNN 将此占用降低，使非常小的设备（如 15 KB 可用 SRAM）也能执行推断。
在 MNIST 和 CIFAR10-Easy 上，eBNN 模型在相同内存约束下达到与对应 BNN 相当的准确率（表 1 结果）。
在评估的网络中，临时变量占比最多为总推理内存的 3%，参数占据其余部分。
在测试设备上，推理运行时间从几十毫秒到每个样本不到 1 秒不等，较小配置达到几十毫秒。
提供一个自动化的云端服务，用于训练 eBNN 并生成可用于部署的嵌入式 C 代码，便于在受限硬件上的快速原型开发。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。