QUICK REVIEW

[论文解读] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Colby Banbury, Chuteng Zhou|arXiv (Cornell University)|Oct 21, 2020

Advanced Neural Network Applications参考文献 51被引用 148

一句话总结

MicroNets 使用 differentiable neural architecture search (DNAS) 来设计 MCU-优化网络，符合 TinyML 约束，在 VWW、KWS、AD 上结合 TensorFlow Lite Micro 在常见 MCU 上实现 state-of-the-art。

ABSTRACT

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection. Models and training scripts can be found at github.com/ARM-software/ML-zoo.

研究动机与目标

证明在统一的模型空间先验下，操作数目是 MCU 模型延迟和能耗的可行代理指标。
证明带有 MCU 相关约束的 differentiable NAS 能产生在内存和延迟方面高效的模型。
在 TinyMLperf 框架内提供用于 Visual Wake Words、Keyword Spotting、和 Anomaly Detection 的最先进 MicroNets。

提出的方法

表征 MCU 推理性能以建立 op count 作为延迟代理。
提出带有内存（eFlash、SRAM）和延迟约束以及亚字节量化选项的可微分 NAS（DNAS）目标函数。
为 VWW、KWS 和 AD 定义 MCU 专用骨干网作为搜索空间，并通过带有内存/延迟正则化的 DNAS 进行优化。
在 CMSIS-NN/TFLM 中加入 4-bit 量化模拟以在硬件约束下扩展搜索空间。
在可用的情况下，使用量化感知训练和知识蒸馏对发现的架构进行训练。
通过 TensorFlow Lite Micro 部署最终模型并在标准 TinyMLperf 任务上进行评估。

实验结果

研究问题

RQ1在给定骨干网的情况下，端到端模型的延迟和能耗是否可以通过操作数（ops）的数量来有效近似？
RQ2在最大化准确度的同时，DNAS 是否可以被约束以满足 MCU 的 SRAM/eFlash 和延迟限制？
RQ3在使用 TFLM 部署时，MCU 优化的 MicroNets 是否在 TinyMLperf 任务 VWW、KWS 和 AD 上实现了最先进的准确度和吞吐量？

主要发现

在骨干网内，Ops 是端到端模型延迟的可行代理，尽管存在每层的变异性。
MCU 的功耗在很大程度上与模型规模无关，使每次推理的能量主要取决于 MCU 大小和模型 ops。
带有 MCU 相关约束的 DNAS 可以生成适配 eFlash 和 SRAM 的架构，同时保持高准确度和可接受的延迟。
MicroNets 在 VWW 和 KWS 任务上为小型和中型 MCU 实现了帕累托最优的权衡。
在 VWW 上，中等 MCU 的 MicroNet 实现了 88.03% 的准确率，接近 MobileNetV2 的 88.75%，同时实现目标 MCU 的部署；对于小型 MCU，MicroNet 比 TFLM 参考高出 3.1% 的准确率并快了 21 ms。
对于 KWS，MicroNet 中等模型比 DS-CNN(L) 快 2.7×，并且更准确。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。