QUICK REVIEW

[论文解读] Hello Edge: Keyword Spotting on Microcontrollers

Yundong Zhang, Naveen Suda|arXiv (Cornell University)|Nov 20, 2017

Speech and Audio Processing参考文献 25被引用 340

一句话总结

本论文在微控制器上评估多种神经网络架构用于关键词检测，结果显示深度可分离卷积网络（DS-CNN）在 MCU 约束内提供最佳准确度，并演示了对 8 位量化的有效部署。

ABSTRACT

Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience. Recently, neural networks have become an attractive choice for KWS architecture because of their superior accuracy compared to traditional speech processing algorithms. Due to its always-on nature, KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. The design of neural network architecture for KWS must consider these constraints. In this work, we perform neural network architecture evaluation and exploration for running KWS on resource-constrained microcontrollers. We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements. We show that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy. We further explore the depthwise separable convolutional neural network (DS-CNN) and compare it against other neural network architectures. DS-CNN achieves an accuracy of 95.4%, which is ~10% higher than the DNN model with similar number of parameters.

研究动机与目标

在 MCU 的内存和计算约束下评估用于端设备关键词检测的各种神经网络架构。
比较不同架构在准确度、内存占用和每次推理的计算量方面的差异。
开发一个受资源限制的神经网络搜索，以在 MCU 限制内识别高精度模型。
提出并评估受 MobileNet 启发的深度可分离卷积网络架构用于 MCU 上的 KWS。
展示在真实 MCU 硬件上的实际部署和量化效果。

提出的方法

在固定的 8 位权重/激活假设下，训练并比较基于文献的 KWS 模型（DNN、CNN、LSTM、CRNN），使用 Google Speech Commands 数据集。
引入并评估基于深度可分离卷积的 DS-CNN 模型，灵感来自 MobileNet。
通过将模型映射到三种 MCU 内存/计算预算（小型、中型、大型）来执行资源受限的架构探索。
将具有代表性的模型量化为 8 位定点权重/激活，并评估准确性损失。
在 Cortex-M7 MCU 上使用 CMSIS-NN 针对 8 位量化的 DNN 模型进行部署，以验证实时性能。

实验结果

研究问题

RQ1在受限于 MCU 资源的情况下，流行的 KWS 模型的准确性、内存占用和计算需求是多少？
RQ2在固定的 MCU 预算内，深度可分离卷积网络（DS-CNN）能否超越先前架构？
RQ38 位量化如何影响 KWS 模型的准确性和在微控制器上的可部署性？
RQ4在越来越紧的内存/计算预算下，DS-CNN 模型的可扩展性和权衡是什么？

主要发现

NN Architecture	Accuracy	Memory	Operations
DNN	84.3%	288 KB	0.57 MOps
CNN-1	90.7%	556 KB	76.02 MOps
CNN-2	84.6%	149 KB	1.46 MOps
LSTM	88.8%	26 KB	2.06 MOps
CRNN	87.8%	298 KB	5.85 MOps

DS-CNN 在 MCU 约束内达到最佳准确率，在小型、中型、大型预算下分别为 94.4%、94.9% 和 95.4%。
量化为 8 位的模型保持或略有提高，与全精度模型相比，便于在 MCU 部署。
使用 Cortex-M7 的 8 位 DS-CNN/ DNN 部署实现 10 次推理/秒，每次推理约 12 ms，总 KWS 占用约 70 KB，验证了实时的设备上性能。
DS-CNN 提供强大可扩展性，在内存和计算预算上超越其他架构（DNN、CNN、LSTM、CRNN）。
为适应低至 8 KB 内存的内存扩展的 DS-CNN 模型，仍然优于具有相似操作数量的 DNN，显示其在超资源受限 MCU 上的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。