QUICK REVIEW

[论文解读] SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

Igor Fedorov, Ryan P. Adams|arXiv (Cornell University)|May 28, 2019

Advanced Neural Network Applications参考文献 44被引用 82

一句话总结

SpArSe 将神经架构搜索与剪枝结合起来，自动设计适合内存受限的微控制器的 CNN，同时保持高精度，达到比以往 MCU 方法更小的模型（最多 4.35x 更小且精度更高）。

ABSTRACT

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet of Things (IoT) promises to inject machine learning into many of these every-day objects via tiny, cheap MCUs. However, these resource-impoverished hardware platforms severely limit the complexity of machine learning models that can be deployed. For example, although convolutional neural networks (CNNs) achieve state-of-the-art results on many visual recognition tasks, CNN inference on MCUs is challenging due to severe finite memory limitations. To circumvent the memory challenge associated with CNNs, various alternatives have been proposed that do fit within the memory budget of an MCU, albeit at the cost of prediction accuracy. This paper challenges the idea that CNNs are not suitable for deployment on MCUs. We demonstrate that it is possible to automatically design CNNs which generalize well, while also being small enough to fit onto memory-limited MCUs. Our Sparse Architecture Search method combines neural architecture search with pruning in a single, unified approach, which learns superior models on four popular IoT datasets. The CNNs we find are more accurate and up to $4.35 imes$ smaller than previous approaches, while meeting the strict MCU working memory constraint.

研究动机与目标

推动在内存受限的MCU上部署CNN，并解决它们对RAM/ROM的严格约束（C1、C2）。
开发一个统一框架（SpArSe），通过搜索架构和剪枝权重来满足内存限制。
展示多目标优化，以在验证准确性、模型大小和工作内存之间取得平衡。
展示联合剪枝与架构搜索在多个数据集上产生更小且更精确的 MCU CNN 的能力。

提出的方法

定义一个多目标设计空间 Omega = {alpha, theta, omega, theta}，表示架构、操作、权重，以及训练/剪枝超参数。
将目标函数表述为：f1 = 1 - ValidationAccuracy，f2 = ModelSize (omega)，f3 = 对各层 WorkingMemory(Omega) 的最大值。
采用帕累托前沿方法来识别在准确性、大小和内存之间权衡的配置 Omega*。
搜索空间包括可变深度/宽度、卷积类型、下采样、残差、多尺度输出，以及训练/剪枝超参数。
通过 Sparse Variational Dropout（非结构化）和 Bayesian Compression（结构化）进行剪枝，以减少 omega 和层输入。
使用三阶段的多目标贝叶斯优化器（MOBO）配合随机标量化来采样配置，并使用网络形态学在不同 morph 之间重用训练权重。
将剪枝作为优化的一部分，以揭示高性能子图并加速收敛。

实验结果

研究问题

RQ1是否可以在严格的 MCU 内存约束（RAM/ROM）下设计 CNN，同时保持有竞争力的准确性？
RQ2通过 SpArSe 同时优化架构和权重量稀疏性，是否在 MCU 部署中优于孤立的剪枝或 NAS？
RQ3面向内存的目标（ModelSize 和 WorkingMemory）对最终架构与性能的影响是什么？
RQ4剪枝如何影响超小型 CNN 中网络边和非零权重之间的关系？

主要发现

SpArSe 在 MNIST、CIFAR10-binary、CUReT-binary、Chars4k-binary 上发现的 CNN 比以往的 MCU 特定方法更高精度且更小。
在内存占用方面优化时，SpArSe 的模型在准确性上超过 Bonsai，同时使用更少的内存，在某些情况下工作内存可显著低于 2 KB。
剪枝显著减少参数数量（例如多达约 80 倍减少参数），并使得发现高性能子图成为可能，这一点单靠边数并不明显。
结构化剪枝和量化使内存测量（MS 和 WM）在 MCU 部署下更现实；在大多数数据集上，SpArSe 生成的模型在 WM/MS 约束下优于 Bonsai。
网络形态学和阶段性搜索显著加速收敛，剪枝在搜索效率和最终模型质量方面提供实质性提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。