QUICK REVIEW

[논문 리뷰] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Colby Banbury, Chuteng Zhou|arXiv (Cornell University)|2020. 10. 21.

Advanced Neural Network Applications참고 문헌 51인용 수 148

한 줄 요약

MicroNets는 differentiable neural architecture search (DNAS)를 이용해 MCU에 최적화된 네트워크를 설계하고 TinyML 제약에 맞추며, 일반적인 MCU에서 TensorFlow Lite Micro를 활용한 VWW, KWS, AD에서 최첨단 성과를 달성한다.

ABSTRACT

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection. Models and training scripts can be found at github.com/ARM-software/ML-zoo.

연구 동기 및 목표

일정한 모델 공간 사전 하에서 연산 수가 MCU 모델 지연 시간과 에너지를 유효한 대리지표가 됨을 입증한다.
MCU 인식 제약을 갖춘 differentiable NAS가 메모리 및 지연 시간 측면에서 효율적인 모델을 도출할 수 있음을 보여준다.
TinyMLperf 프레임워크 내에서 Visual Wake Words, Keyword Spotting, Anomaly Detection에 대한 최신 성능의 MicroNets를 제공한다.

제안 방법

연산 수를 지연 프록시로 설정하기 위해 MCU 추론 성능을 특성화한다.
메모리(eFlash, SRAM) 및 지연 제약과 서브 바이트 양자화 옵션을 포함하는 differentiable NAS (DNAS) 목표를 형식화한다.
VWW, KWS, AD에 대해 MCU-특정 백본을 탐색 공간으로 정의하고 메모리/지연 규제화를 통한 DNAS로 최적화한다.
하드웨어 제약 하에서 탐색 공간을 확장하기 위해 CMSIS-NN/TFLM 내에 4비트 양자화 에뮬레이션을 포함한다.
해당 가능 시 양자화 인지 학습과 지식 증류를 사용하여 발견된 아키텍처를 학습시킨다.
최종 모델을 TensorFlow Lite Micro를 통해 배포하고 표준 TinyMLperf 과제에서 평가한다.

실험 결과

연구 질문

RQ1주어진 백본 내에서 엔드-투-엔드 모델의 MCU 지연 시간과 에너지를 연산 수(ops)로 효과적으로 근사할 수 있는가?
RQ2DNAS를 MCU의 SRAM/eFlash 및 지연 한계를 만족시키도록 제약하면서 정확도를 최대화할 수 있는가?
RQ3TFLM으로 배포될 때 MCU 최적화 MicroNets가 TinyMLperf 과제 VWW, KWS, AD에서 최신 정확도와 처리량을 달성하는가?

주요 결과

연산 수는 백본 내 MCUs에서 엔드-투-엔드 모델 지연 시간에 대한 실현 가능한 프록시이며, 계층별 변동에도 불구하고 유효하다.
MCU 전력은 모델 크기와 큰 관련이 없는 편이며, 추론당 에너지는 주로 MCU 크기와 모델 ops의 함수이다.
DNAS와 MCU 인식 제약을 갖춘 제약은 eFlash와 SRAM에 맞추면서 높은 정확도와 허용 가능한 지연을 유지하는 아키텍처를 생성할 수 있다.
MicroNets는 VWW 및 KWS 과제에서 소형 및 중형 MCU에 대한 파레토 최적의 트레이드오프를 달성한다.
VWW에서 중형 MCU용 MicroNet은 88.03% 정확도를 달성하여 대상 MCU에 배포 가능하면서 MobileNetV2 88.75%에 근접하다; 소형 MCU의 경우 MicroNet은 TFLM 레퍼런스보다 3.1% 더 정확하고 21 ms 빠르다.
KWS의 경우 중형 모형은 DS-CNN(L)보다 2.7배 빠르고 더 정확하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.