QUICK REVIEW

[论文解读] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen|arXiv (Cornell University)|Oct 28, 2021

Advanced Neural Network Applications参考文献 57被引用 50

一句话总结

MCUNetV2 引入基于补丁的推断、感受野再分配与 NAS，以显著降低 MCU 峰值内存，从而在极小图像分类与目标检测任务中实现更高分辨率输入和最先进的准确率。

ABSTRACT

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2. Patch-based inference effectively reduces the peak memory usage of existing networks by 4-8x. Co-designed with neural networks, MCUNetV2 sets a record ImageNet accuracy on MCU (71.8%), and achieves >90% accuracy on the visual wake words dataset under only 32kB SRAM. MCUNetV2 also unblocks object detection on tiny devices, achieving 16.9% higher mAP on Pascal VOC compared to the state-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the way for various vision applications beyond image classification.

研究动机与目标

识别部署在 SRAM 极为有限的 MCU 上的 CNN 的内存瓶颈。
提出基于补丁的推断方案，在不改变模型精度的前提下降低峰值内存。
在 MCU 约束下通过神经架构搜索自动联合设计网络架构和推断调度。
在严格内存预算下，在 ImageNet、Visual Wake Words、Pascal VOC 及其他小型视觉任务上展示提升。

提出的方法

分析高效 CNN 主干的内存使用情况，并观察到激活内存分布不均衡。
提出对初始高内存阶段按补丁逐步执行以降低峰值内存。
引入感受野重新分配，将计算转移到网络后段以降低重叠开销。
在硬件约束下通过神经架构搜索联合优化骨干网络架构和推断调度。
在多个数据集和 MCU 平台上评估有无感受野重新分配的基于补丁的推断。

实验结果

研究问题

RQ1CNN 中不均衡的内存分布如何限制基于 MCU 的推断？
RQ2基于补丁的推断能否在不产生难以承受的重计算或精度损失的情况下降低峰值内存？
RQ3再分配感受野是否在保持性能的同时进一步降低计算开销？
RQ4在 MCU 约束下，联合神经架构搜索是否能同时优化模型和推断调度以最大化准确性？

主要发现

基于补丁的推断在所研究的网络中将峰值内存降低了 4–8×。
感受野重新分配将额外计算降至约 3–4%，同时保持准确性。
在 ImageNet 上，MCUNetV2 在 512kB SRAM/2MB Flash 的 MCU 上实现了创纪录的 71.8% Top-1 精度。
在 Visual Wake Words 上，MCUNetV2 在低于 32kB SRAM 的情况下达到 >90% 的准确率。
在 Pascal VOC 的目标检测中，MCUNetV2-H7 达到 68.3% VOC mAP，在类似约束下较前一代最先进方法提升 16.9%。
MCUNetV2 使得在内存受限的微型设备上进行更高分辨率输入和密集预测任务成为可能，之前因内存限制而不切实际。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。