QUICK REVIEW

[논문 리뷰] TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

Robert David, Jared Duke|arXiv (Cornell University)|2020. 10. 17.

Advanced Neural Network Applications참고 문헌 17인용 수 167

한 줄 요약

논문은 임베디드 TinyML 디바이스용으로 설계된 인터프리터 기반의 이식형 ML 추론 프레임워크인 TensorFlow Lite Micro(TFLM)를 소개하여, 최소 런타임 오버헤드 및 메모리 발자국으로 교차 플랫폼 배포를 가능하게 한다.

ABSTRACT

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.

연구 동기 및 목표

분절화된 임베디드 하드웨어와 제한된 자원 한계에서 ML 배치의 도전 과제를 식별한다.
마이크로컨트롤러 및 유사 장치를 위한 휴대 가능하고 인터프리터 기반의 ML 추론 프레임워크를 제안한다.
낮은 메모리 사용, 이식성 및 벤더 커널 최적화를 가능하게 하는 설계 결정을 시연한다.
임베디드 타깃에서 모델을 내보내고 실행하기 위해 TensorFlow Lite 도구를 어떻게 활용하는지 보여준다.

제안 방법

휴대성 극대화 및 기기 간 모델 재내보내기(re-export)를 줄이기 위해 인터프리터 기반 추론 방식을 채택한다.
압축 해제 없이 모델을 로드하기 위해 TensorFlow Lite 모델 형식과 FlatBuffer 직렬화를 재사용한다.
런타임 및 지속 메모리를 최소화하기 위해 이중 스택 메모리 어레나와 메모리 플래너를 구현한다.
여러 인터프리터 간에 하나의 어레나를 공유하여 다중 임대(multitenancy)를 지원한다.
빌드 스크립트를 변경하지 않고 벤더 최적화 커널(CMSIS-NN 등)을 교체하여 플랫폼 특화를 가능하게 한다.
이종 임베디드 툴체인을 아우르는 플랫폼-독립 빌드 시스템을 제공한다.

실험 결과

연구 질문

RQ1인터프리터 기반 ML 추론 프레임워크가 하드웨어 플랫폼에 걸쳐 이식성을 유지하면서 임베디드 TinyML 장치의 자원 제약을 충족할 수 있는가?
RQ2마이크로컨트롤러에서 반복 추론을 위한 어레나 풋프린트를 최소화하도록 메모리 관리와 메모리 계획을 어떻게 설계할 수 있는가?
RQ3벤더 최적화 커널을 이식성 및 유지보수성을 희생시키지 않으면서 어느 정도까지 통합할 수 있는가?
RQ4기존의 TensorFlow Lite 도구를 임베디드 대상에 대한 모델 내보내기 및 배포에 얼마나 효과적으로 재사용할 수 있는가?

주요 결과

TFLM은 임베디드 추론을 위한 낮은 자원 요구사항과 최소 런타임 오버헤드를 보여준다.
영구화된 커널 복잡성의 보상으로 인터프리터 기반 접근 방식은 임베디드 ML에 적합할 수 있다.
TensorFlow Lite 도구의 재사용은 임베디드 타깃으로의 모델 내보내기를 용이하게 한다.
이중 스택 메모리 할당 전략과 메모리 플래너는 어레나 크기를 줄이고 메모리 재사용을 가능하게 한다.
커널 교체를 통한 플랫폼 특화(CMSIS-NN 등)는 빌드 시스템을 변경하지 않고도 성능 향상을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.