QUICK REVIEW

[論文レビュー] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen|arXiv (Cornell University)|Oct 28, 2021

Advanced Neural Network Applications参考文献 57被引用数 50

ひとこと要約

MCUNetV2は、パッチベースの推論と受容野再配分、およびNASを導入して、MCU上のピークメモリを大幅に削減し、tiny image classificationと物体検出の高解像度入力と最先端の精度を実現します。

ABSTRACT

Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2. Patch-based inference effectively reduces the peak memory usage of existing networks by 4-8x. Co-designed with neural networks, MCUNetV2 sets a record ImageNet accuracy on MCU (71.8%), and achieves >90% accuracy on the visual wake words dataset under only 32kB SRAM. MCUNetV2 also unblocks object detection on tiny devices, achieving 16.9% higher mAP on Pascal VOC compared to the state-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the way for various vision applications beyond image classification.

研究の動機と目的

極めて制限された SRAMを持つMCU上でデプロイされるCNNのメモリ瓶頸を特定する。
モデルの精度を変えずにピークメモリを削減するパッチベース推論方式を提案する。
MCUの制約の下で、ニューラルアーキテクチャ検索を用いて自動的にアーキテクチャと推論スケジューリングを共同設計する。
厳しいメモリ予算の下で ImageNet、Visual Wake Words、Pascal VOC、その他の小型ビジョンタスクで利得を実証する。

提案手法

効率的なCNNバックボーンにおけるメモリ使用を解析し、アクティベーションメモリ分布の不均衡を観察する。
ピークメモリを低減するため、初期のメモリ集約段をパッチごとに実行する方式を提案する。
受容野再配分を導入し、計算を後半のネットワーク段へ移し、オーバーラップのオーバーヘッドを削減する。
ハードウェア制約の下で、ニューラルアーキテクチャ検索を用いてバックボーンアーキテクチャと推論スケジューリングを共同最適化する。
複数のデータセットとMCUプラットフォームにわたり、受容野再配分の有無でパッチベース推論を評価する。

実験結果

リサーチクエスチョン

RQ1CNNにおける不均衡なメモリ分布はMCUベースの推論をどのように制約するか？
RQ2パッチベース推論は過度な再計算や精度低下を招くことなくピークメモリを削減できるか？
RQ3受容野の再分配は、性能を維持しつつ計算オーバーヘッドをさらに削減するか？
RQ4MCU制約下で、モデルと推論スケジュールの両方を最適化するためにJoint Neural Architecture Searchは精度を最大化できるか？

主な発見

パッチベース推論は、調査対象のネットワーク全体でピークメモリを4～8×削減する。
受容野再配分は、精度を維持したまま追加計算を約3–4%に低減する。
ImageNetでは、512kB SRAM/2MB FlashのMCU上でMCUNetV2が記録的な71.8% Top-1精度を達成。
Visual Wake Wordsでは、32kB未満の SRAMで>90%の精度を達成。
Pascal VOCでの物体検出では、MCUNetV2-H7は68.3% VOC mAPを達成し、同様の制約下で前任の最先端より16.9%向上。
MCUNetV2は、メモリ制限のため以前は実現不能だった小型デバイスでの高解像度入力と密な予測タスクを可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。