QUICK REVIEW

[论文解读] Visual Wake Words Dataset

Aakanksha Chowdhery, Pete Warden|arXiv (Cornell University)|Jun 12, 2019

IoT and Edge/Fog Computing参考文献 26被引用 84

一句话总结

本论文引入 Visual Wake Words，这是一个基于 COCO 的二分类人/非人数据集，用于在微控制器内存约束下对轻量视觉模型进行基准测试，在 250 KB 内存和 60M 乘累加运算下达到 85–90% 的准确率。它分析内存-延迟权衡，并对 MobileNet 变体在边缘 AI 部署中的基准测试。

ABSTRACT

The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory footprint in terms of peak usage and model size on device storage. To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. We anticipate the proposed dataset will advance the research on tiny vision models that can push the pareto-optimal boundary in terms of accuracy versus memory usage for microcontroller applications.

研究动机与目标

激发在内存极度受限的微控制器上进行就地视觉处理的需求。
提出 Visual Wake Words 作为从 COCO 派生的现实二分类基准测试。
描述边缘设备上小型 CNN 的内存、延迟和模型尺寸的权衡。
在每次推断 ≤250 KB 闪存/ SRAM 限制和 60M MACs 下对最先进的移动模型进行基准测试。

提出的方法

为微型视觉模型定义设计约束：峰值内存 ≤250 KB，推断每次 ≤60M MACs。
通过基于边界框面积（大于图像的 0.5%）将 COCO 图像重新标注为 person/not-person，来创建 Visual Wake Words 数据集。
使用 8-bit 权重/激活对 MobileNet V1/V2、MNasNet 和 ShuffleNet 进行训练和量化。
在 ImageNet 和 Visual Wake Words 数据集上评估准确率对峰值内存、参数量和 MACs 的关系。
研究适配 SRAM 限制的 MobileNet V2 和 MNasNet 的内存管理技术。

实验结果

研究问题

RQ1在 250 KB 内存和 60M MACs 限制下，微型视觉模型在 Visual Wake Words 上能够达到的准确率是多少？
RQ2随着图像分辨率和深度乘数的变化，模型尺寸、峰值内存和计算量在边缘设备约束下如何扩展？
RQ3在受限于微控制器硬件时，移动架构中残差与并行路径带来哪些内存-延迟权衡？
RQ48-bit 量化能否在微控制器上为 person/not-person 分类提供具有竞争力的性能？

主要发现

在 250 KB 内存限制下，Visual Wake Words 结合最先进的移动模型可实现 85–90% 的准确率。
MobileNet V1/V2、MNasNet 和 ShuffleNet 在 Visual Wake Words 上取得高准确率，同时适合 250 KB 闪存存储。
峰值内存常由前几层的激活映射主导，需要为并行路径采用节省内存的策略。
降低图像分辨率可降低峰值内存和 MACs，但可能限制准确率；权衡取决于结构和深度乘数。
使用量化感知训练的 8-bit 量化在二分类任务上实现了具有竞争力的准确率。
在 ImageNet 上，同样的模型具有较低的 top-1 准确率，说明 Visual Wake Words 数据集为微型视觉模型提供了一个独特的帕累托前沿。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。