QUICK REVIEW

[论文解读] DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework

Shuochao Yao, Yiran Zhao|arXiv (Cornell University)|Jun 5, 2017

Model Reduction and Neural Networks参考文献 48被引用 40

一句话总结

DeepIoT 提出了一种统一的、结构感知的压缩框架，采用压缩器-评论家训练范式，自动确定全连接、卷积和循环神经网络中冗余隐藏单元的最优丢弃概率。该方法在 Intel Edison 等嵌入式设备上实现了 90% 至 98.9% 的模型尺寸缩减，推理速度提升 71.4% 至 94.5%，能耗降低 72.2% 至 95.7%，且无准确率损失。

ABSTRACT

Recent advances in deep learning motivate the use of deep neutral networks in sensing applications, but their excessive resource needs on constrained embedded devices remain an important impediment. A recently explored solution space lies in compressing (approximating or simplifying) deep neural networks in some manner before use on the device. We propose a new compression solution, called DeepIoT, that makes two key contributions in that space. First, unlike current solutions geared for compressing specific types of neural networks, DeepIoT presents a unified approach that compresses all commonly used deep learning structures for sensing applications, including fully-connected, convolutional, and recurrent neural networks, as well as their combinations. Second, unlike solutions that either sparsify weight matrices or assume linear structure within weight matrices, DeepIoT compresses neural network structures into smaller dense matrices by finding the minimum number of non-redundant hidden elements, such as filters and dimensions required by each layer, while keeping the performance of sensing applications the same. Importantly, it does so using an approach that obtains a global view of parameter redundancies, which is shown to produce superior compression. We conduct experiments with five different sensing-related tasks on Intel Edison devices. DeepIoT outperforms all compared baseline algorithms with respect to execution time and energy consumption by a significant margin. It reduces the size of deep neural networks by 90% to 98.9%. It is thus able to shorten execution time by 71.4% to 94.5%, and decrease energy consumption by 72.2% to 95.7%. These improvements are achieved without loss of accuracy. The results underscore the potential of DeepIoT for advancing the exploitation of deep neural networks on resource-constrained embedded devices.

研究动机与目标

解决在物联网传感应用中，将深度神经网络部署于资源受限的嵌入式设备时面临的高内存、高能耗和高延迟需求。
开发一种适用于全连接、卷积和循环网络等多种深度学习架构的统一压缩方法。
通过剪枝冗余隐藏单元而非依赖稀疏性或线性假设，最小化网络参数。
联合优化压缩器网络与评论家网络，以学习每个隐藏单元的最优丢弃概率。
实现压缩模型在嵌入式系统上的直接部署，无需修改推理库。

提出的方法

采用压缩器-评论家框架，其中压缩器为每个隐藏单元预测最优丢弃概率，评论家评估剪枝后网络的性能表现。
压缩器网络与原始网络端到端联合训练，采用类似强化学习的目标函数，以最小化冗余性同时保持准确率。
通过基于学习到的丢弃概率丢弃隐藏单元实现剪枝，从而生成更小的密集矩阵，而非稀疏矩阵。
该方法与网络架构无关，可统一应用于全连接、卷积和循环层。
在部署前于工作站上进行微调，确保压缩后的模型可直接用于边缘设备。
该方法避免依赖矩阵分解或稀疏性假设，后者在 1D 滤波器和循环层中可能表现不佳。

实验结果

研究问题

RQ1统一的压缩框架能否有效降低物联网传感应用中多种深度学习架构的模型尺寸？
RQ2通过压缩器-评论家框架学习最优丢弃概率，是否能带来优于固定或启发式丢弃策略的压缩效率与性能？
RQ3与基于稀疏性或分解的压缩方法相比，通过隐藏单元移除实现的结构化剪枝是否能在嵌入式设备上实现更低的能耗与延迟？
RQ4在低功耗平台的真实传感任务中，模型尺寸可被缩减至何种程度而不损失准确率？
RQ5压缩后的模型是否与移动和嵌入式系统上现有的深度学习推理库兼容？

主要发现

DeepIoT 在 Intel Edison 设备上对五项传感任务的模型尺寸缩减了 90% 至 98.9%。
与基线方法相比，执行时间减少了 71.4% 至 94.5%。
能耗降低了 72.2% 至 95.7%，其中 HHAR 数据集的能耗降幅最大，达 95.7%。
该方法在速度和能效方面全面优于所有基线方法，包括 SparseSep 及其他基于稀疏性的方法。
压缩后的模型保持了原始准确率，表明冗余性降低并未导致性能下降。
该方法与现有深度学习库兼容，可直接部署于嵌入式系统，无需运行时修改。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。