QUICK REVIEW

[論文レビュー] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Colby Banbury, Chuteng Zhou|arXiv (Cornell University)|Oct 21, 2020

Advanced Neural Network Applications参考文献 51被引用数 148

ひとこと要約

MicroNets は differentiable neural architecture search (DNAS) を用いて MCU 最適化ネットワークを設計し、TinyML の制約に適合させ、VWW、KWS、および AD において TensorFlow Lite Micro を用いた一般的な MCU で最先端の結果を達成します。

ABSTRACT

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection. Models and training scripts can be found at github.com/ARM-software/ML-zoo.

研究の動機と目的

操作数が一様なモデル空間先行の下で MCU モデルの待ち時間とエネルギの有効な代理指標であることを実証する。
MCU 対応の制約を持つ differentiable NAS がメモリと遅延効率の良いモデルを生み出せることを示す。
TinyMLperf フレームワーク内で VWW、KWS、AD のための最先端の MicroNets を提供する。

提案手法

MCU 推論性能を特徴づけて、 op 数を遅延の代理指標として確立する。
メモリ（eFlash、SRAM）と遅延制約およびサブバイト量子化オプションを用いた differentiable NAS（DNAS）目的を定式化する。
検索空間として VWW、KWS、AD の MCU 固有の Backbone を定義し、メモリ/遅延正則化を伴う DNAS によって最適化する。
CMSIS-NN/TFLM 内で 4 ビット量子化エミュレーションを組み込み、ハードウェア制約の下で探索空間を拡張する。
発見されたアーキテクチャを量子化認識訓練および適用可能な場合には知識蒸留と共に訓練する。
最終モデルを TensorFlow Lite Micro を介してデプロイし、標準的な TinyMLperf タスクで評価する。

実験結果

リサーチクエスチョン

RQ1エンドツーエンドのモデル内で、ある Backbone 内での op 数によって MCU の待機時間とエネルギーを効果的に近似できるか。
RQ2DNAS を MCU SRAM/eFlash と遅延の制約に合わせて、精度を最大化しつつ制約を満たすように導くことができるか。
RQ3MCU 最適化された MicroNets は TFLM を用いたデプロイで TinyMLperf タスク VWW、KWS、AD において最先端の精度とスループットを達成するか。

主な発見

Backbone 内で MCU のエンドツーエンドのモデル遅延に対して Ops が有効な代理指標となる（層ごとのばらつきがあるにもかかわらず）。
MCU の電力は主にモデルサイズには影響されず、推論あたりのエネルギーは主に MCU のサイズとモデル ops の関数である。
MCU 対応の制約を持つ DNAS により、eFlash および SRAM に適合しつつ高い精度と許容遅延を維持できるアーキテクチャを生み出せる。
MicroNets は VWW および KWS タスクにおいて、小型および中型 MCU でパレート最適なトレードオフを達成。
VWW では中型 MCU の MicroNet が 88.03% の精度を達成し、ターゲット MCU へのデプロイを可能にする一方で MobileNetV2 の 88.75% に近い数値を示す；小型 MCU では MicroNet が TFLM リファレンスより 3.1% 高精度かつ 21 ms 速い。
KWS では中型モデルが DS-CNN(L) より 2.7×高速で、より高精度。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。