[论文解读] SquishedNets: Squishing SqueezeNet further for edge device scenarios via deep evolutionary synthesis
该论文提出SquishedNets,一种通过结合低类别场景的架构修改与深度进化合成生成的超紧凑深度神经网络家族。通过将SqueezeNet v1.1修改为10类ImageNet-10任务,并经过15代进化,该方法生成的模型最小可达0.95MB——比SqueezeNet v1.1小5.17倍——同时保持77%的top-1准确率,并在嵌入式GPU上实现高达256张图像/秒的推理速度。
While deep neural networks have been shown in recent years to outperform other machine learning methods in a wide range of applications, one of the biggest challenges with enabling deep neural networks for widespread deployment on edge devices such as mobile and other consumer devices is high computational and memory requirements. Recently, there has been greater exploration into small deep neural network architectures that are more suitable for edge devices, with one of the most popular architectures being SqueezeNet, with an incredibly small model size of 4.8MB. Taking further advantage of the notion that many applications of machine learning on edge devices are often characterized by a low number of target classes, this study explores the utility of combining architectural modifications and an evolutionary synthesis strategy for synthesizing even smaller deep neural architectures based on the more recent SqueezeNet v1.1 macroarchitecture for applications with fewer target classes. In particular, architectural modifications are first made to SqueezeNet v1.1 to accommodate for a 10-class ImageNet-10 dataset, and then an evolutionary synthesis strategy is leveraged to synthesize more efficient deep neural networks based on this modified macroarchitecture. The resulting SquishedNets possess model sizes ranging from 2.4MB to 0.95MB (~5.17X smaller than SqueezeNet v1.1, or 253X smaller than AlexNet). Furthermore, the SquishedNets are still able to achieve accuracies ranging from 81.2% to 77%, and able to process at speeds of 156 images/sec to as much as 256 images/sec on a Nvidia Jetson TX1 embedded chip. These preliminary results show that a combination of architectural modifications and an evolutionary synthesis strategy can be a useful tool for producing very small deep neural network architectures that are well-suited for edge device scenarios.
研究动机与目标
- 为解决在计算和内存需求高的资源受限边缘设备上部署深度神经网络的挑战。
- 在不依赖训练后量化或压缩技术的前提下,减少模型大小和推理延迟。
- 探索针对少类别场景定制的架构修改是否能实现超越现有高效架构(如SqueezeNet v1.1)的进一步模型压缩。
- 评估深度进化合成在生成适用于边缘部署的高效率、小规模深度神经网络方面的有效性。
提出的方法
- 通过将SqueezeNet v1.1的最后一个全连接层(conv10)替换为10个滤波器的1x1卷积,实施了架构修改,以减少参数量,尤其因为该层占总参数量的约40%。
- 采用了一种进化合成策略,其中每一代网络通过受合成概率模型P(H_g) ≈ P(H_g|H_{g-1}) · R(R < 1)引导的随机过程生成,以强制实现资源受限环境。
- 进化过程运行了15代,以修改后的SqueezeNet v1.1架构作为祖先前体。
- 环境约束通过模型R编码,以在各代中偏好更小、更快、参数效率更高的架构。
- 每个后代网络均在ImageNet-10数据集上进行训练和评估,以衡量准确率和推理速度。
- 最终的SquishedNets基于模型大小、推理速度和10类基准上的top-1准确率之间的平衡进行选择。
实验结果
研究问题
- RQ1针对少类别分类任务定制的架构修改是否能显著减小高效深度神经网络的模型大小?
- RQ2深度进化合成是否能在不损失准确率或速度的前提下,进一步压缩SqueezeNet v1.1等先进高效架构?
- RQ3在保持边缘设备上的高推理速度和准确率的前提下,模型大小能比SqueezeNet v1.1进一步减小到何种程度?
- RQ4是否可能在不使用后处理压缩或量化的情况下,实现超紧凑模型(例如,<1MB)用于边缘部署?
主要发现
- 最小的SquishedNet模型大小为0.95MB,相比SqueezeNet v1.1缩小了5.17倍,相比AlexNet缩小了253倍。
- SquishedNets在Nvidia Jetson TX1上实现了每秒156至256张图像的推理速度,展示了在嵌入式硬件上的强大实时性能。
- 在10类ImageNet-10数据集上,top-1准确率范围为81.2%至77.0%,表明在极端模型压缩下仍保持了优异性能。
- 将针对少类别任务的架构剪枝与进化合成相结合,成功生成了既高度紧凑又高效的模型,且无需量化或压缩。
- 通过环境因子模型R < 1强制实现资源匮乏环境,进化合成过程有效引导搜索向更小、更快的架构发展。
- 结果表明,架构创新与进化搜索可独立于模型压缩技术,用于实现适用于边缘部署的超轻量级模型。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。