QUICK REVIEW

[論文レビュー] TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

Robert David, Jared Duke|arXiv (Cornell University)|Oct 17, 2020

Advanced Neural Network Applications参考文献 17被引用数 167

ひとこと要約

この論文は TensorFlow Lite Micro (TFLM) の概要を示します。インタープリター型でポータブルな ML 推論フレームワークは、組み込み TinyML デバイス向けのデプロイを実現し、最小限のランタイムオーバーヘッドとメモリフットプリントでのクロスプラットフォーム展開を可能にします。

ABSTRACT

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.

研究の動機と目的

断片化した組み込みハードウェアと厳しいリソース制限の下で、ML の展開における課題を特定する。
マイクロコントローラや同様のデバイス向けのポータブルでインタープリター型の ML 推論フレームワークを提案する。
低メモリ使用、ポータビリティ、ベンダーのカーネル最適化を可能にする設計上の決定を示す。
組み込みターゲット上でモデルをエクスポートし実行するために TensorFlow Lite ツールを活用する方法を示す。

提案手法

インタープリター型推論アプローチを採用して、ポータビリティを最大化し、デバイス間でのモデル再エクスポートを削減する。
モデルのロード時 unpacking せずに TensorFlow Lite のモデル形式と FlatBuffer シリアライズを再利用する。
ランタイムと永続メモリを最小化するために、二重スタックのメモリアリーナとメモリプランナーを実装する。
複数のインタープリター間で単一のアリーナを共有してマルチテナンシーをサポートする。
ビルドスクリプトを変更せずにベンダー最適化カーネル（例: CMSIS-NN）を差し替えることでプラットフォーム特化を有効にする。
異種組み込みツールチェーンを跨ぐプラットフォーム非依存のビルドシステムを提供する。

実験結果

リサーチクエスチョン

RQ1インタープリター型 ML 推論フレームワークは、ハードウェアプラットフォームを跨ってポータブルでありつつ、組込み TinyML デバイスのリソース制約を満たすことができるだろうか。
RQ2マイクロコントローラでの繰り返し推論におけるアリーナのフットプリントを最小化するため、メモリ管理とメモリプランニングをどのように設計すべきか。
RQ3ポータブル性と保守性を犠牲にすることなく、ベンダー最適化カーネルをどの程度まで統合できるか。
RQ4既存の TensorFlow Lite ツールを組み込みターゲットへのモデルエクスポートとデプロイにどの程度再利用できるか。

主な発見

TFLM は組み込み推論における低リソース要件と最小の実行時オーバーヘッドを示す。
インタープリター型アプローチは、カーネルの複雑さを償却できるため組み込み ML に適している。
TensorFlow Lite ツールの再利用により、モデルを組み込みターゲットに簡単にエクスポートできる。
二重スタックのメモリアロケーション戦略とメモリプランナーはアリーナサイズを削減し、メモリ再利用を可能にする。
カーネルスワッピングによるプラットフォーム特化（例: CMSIS-NN）はビルドシステムを変更せずに性能向上を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。