QUICK REVIEW

[论文解读] TinyTracker: Ultra-Fast and Ultra-Low-Power Edge Vision In-Sensor for Gaze Estimation

Pietro Bonazzi, Thomas Ruegg|arXiv (Cornell University)|Jan 1, 2023

Gaze Tracking and Assistive Technology被引用 1

一句话总结

本论文提出 TinyTracker，一种高度压缩、完全量化、专为索尼 IMX500 人工智能感测器平台设计的 2D 眼动估计模型，可在超低功耗、超快速边缘推理场景下运行。TinyTracker 在完全量化后实现 41 倍模型尺寸压缩（600KB），仅损失 0.16 cm 准确度，实现 19 ms 的端到端眼动估计与 4.9 mJ 的总能耗，其速度与功耗效率优于 Coral Micro 与 Spresense。

ABSTRACT

Intelligent edge vision tasks encounter the critical challenge of ensuring power and latency efficiency due to the typically heavy computational load they impose on edge platforms.This work leverages one of the first "AI in sensor" vision platforms, IMX500 by Sony, to achieve ultra-fast and ultra-low-power end-to-end edge vision applications. We evaluate the IMX500 and compare it to other edge platforms, such as the Google Coral Dev Micro and Sony Spresense, by exploring gaze estimation as a case study. We propose TinyTracker, a highly efficient, fully quantized model for 2D gaze estimation designed to maximize the performance of the edge vision systems considered in this study. TinyTracker achieves a 41x size reduction (600Kb) compared to iTracker [1] without significant loss in gaze estimation accuracy (maximum of 0.16 cm when fully quantized). TinyTracker's deployment on the Sony IMX500 vision sensor results in end-to-end latency of around 19ms. The camera takes around 17.9ms to read, process and transmit the pixels to the accelerator. The inference time of the network is 0.86ms with an additional 0.24 ms for retrieving the results from the sensor. The overall energy consumption of the end-to-end system is 4.9 mJ, including 0.06 mJ for inference. The end-to-end study shows that IMX500 is 1.7x faster than CoralMicro (19ms vs 34.4ms) and 7x more power efficient (4.9mJ VS 34.2mJ)

研究动机与目标

解决边缘视觉人工智能中功耗与延迟效率的关键挑战，尤其针对电池供电、实时应用。
评估并比较最先进的商用边缘平台——索尼 IMX500、Spresense 与 Coral Dev Micro——在端到端视觉工作负载中的表现。
设计一种高效、小于 1MB 的 2D 眼动估计模型，在极端资源约束下仍保持高准确度。
证明在 IMX500 等人工智能感测器平台实现毫秒级端到端推理的可行性。
优化模型压缩与硬件利用率，实现边缘人工智能视觉领域前所未有的功耗与速度效率。

提出的方法

开发了基于 MobileNetV3 的紧凑卷积神经网络 TinyTracker，减少参数量与乘加操作数，将多输入（人脸、眼睛、网格）替换为单一人脸图像与网格嵌入，以提升边缘兼容性。
将人脸坐标嵌入集成至输入中，以保留空间定位信息，无需单独的面部网格输入。
对模型应用全量化（INT8），以最小化模型大小与能耗，同时保持高准确度。
将 TinyTracker 部署于索尼 IMX500，利用其感测器内的人工智能加速器直接在感测器端处理图像，最大限度减少数据移动与延迟。
在三个平台进行全面的端到端性能分析，测量从图像采集到预测的推理时间、能耗与功耗效率。
采用标准化评估指标，包括眼动预测误差（cm）、推理延迟（ms）与每次推理的能耗（mJ），遵循 iTracker 基准测试协议。

实验结果

研究问题

RQ1高度压缩、完全量化的视觉模型是否能在边缘 AI 平台严格的功耗与延迟约束下实现高准确度？
RQ2与传统的边缘平台（如 Coral Micro 与 Spresense）相比，IMX500 的感测器内 AI 处理在端到端延迟与能效方面表现如何？
RQ3在眼动估计任务中，模型压缩与量化能在多大程度上减少模型尺寸与能耗，而不会造成显著的准确度下降？
RQ4与外部 TPU 或基于 MCU 的系统相比，将推理任务卸载至集成 AI 加速器的感测器上，性能影响如何？
RQ5将空间网格嵌入集成至输入中，如何提升紧凑模型的眼动估计精度？

主要发现

TinyTracker 相较于 iTracker 实现 41 倍的模型尺寸压缩（从约 24 MB 降至约 600 KB），在完全量化后仅增加 0.16 cm 的眼动估计误差。
在索尼 IMX500 上的端到端系统实现 19 ms 的延迟，其中感测器读取/处理/传输耗时 17.9 ms，推理耗时 0.86 ms。
端到端系统的总能耗为 4.9 mJ，其中仅 0.06 mJ 用于推理，功耗效率是 Coral Micro 的 7 倍。
IMX500 在端到端评估中比 Coral Dev Micro 快 1.7 倍（19 ms vs 34.4 ms），且功耗效率高出 20 倍（4.9 mJ vs 34.2 mJ）。
IMX500 实现 73.23 MAC/周期的效率，显著优于 Spresense（0.20 MAC/周期）与 Coral Micro（8.69 MAC/周期），表明其硬件利用率更优。
在输入中加入网格嵌入可使精度提升 0.5 cm，证实空间定位信息能增强紧凑模型的眼动估计准确度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。