QUICK REVIEW

[论文解读] TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

Robert David, Jared Duke|arXiv (Cornell University)|Oct 17, 2020

Advanced Neural Network Applications参考文献 17被引用 167

一句话总结

本文介绍 TensorFlow Lite Micro (TFLM)，一种基于解释器的、便携的 ML 推理框架，设计用于嵌入式 TinyML 设备，实现跨平台部署，运行时开销和内存占用最小。

ABSTRACT

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.

研究动机与目标

识别在碎片化的嵌入式硬件和资源紧张的环境中部署 ML 的挑战。
提出一个便携的、基于解释器的 ML 推理框架，适用于微控制器及类似设备。
展示实现设计决策以实现低内存使用、可移植性和供应商内核优化。
展示如何利用 TensorFlow Lite 工具链将模型导出并在嵌入式目标上运行。

提出的方法

采用基于解释器的推理方法以最大化可移植性并减少跨设备重新导出模型的需求。
重用 TensorFlow Lite 模型格式和 FlatBuffer 序列化，以在不解包的情况下加载模型。
实现两栈内存区域和内存规划器以最小化运行时和持久内存。
通过在单个 Arena 上跨多个解释器共享实现多租户。
通过切换供应商优化的内核（如 CMSIS-NN）实现平台专业化，而不改变构建脚本。
提供跨异构嵌入式工具链的、平台无关的构建系统。

实验结果

研究问题

RQ1基于解释器的 ML 推理框架在嵌入式 TinyML 设备的资源约束下是否能保持跨硬件平台的可移植性？
RQ2如何设计内存管理和内存规划以在微控制器的重复推理中最小化 Arena 的占用？
RQ3在不牺牲可移植性和可维护性的前提下，供应商优化的内核能在多大程度上被集成？
RQ4现有的 TensorFlow Lite 工具在嵌入式目标的模型导出与部署中能被多大程度地重复使用？

主要发现

TFLM 展示了嵌入式推理的低资源需求和最小运行时开销。
基于解释器的方法可能适合嵌入式 ML，因为内核复杂性是摊销的。
重用 TensorFlow Lite 工具使模型导出到嵌入式目标变得容易。
两栈内存分配策略和内存规划器降低 Arena 大小并实现内存重用。
通过内核切换实现的平台专业化（如 CMSIS-NN）在不改变构建系统的情况下实现性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。