QUICK REVIEW

[论文解读] ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

Xiaoliang Dai, Peizhao Zhang|arXiv (Cornell University)|Dec 21, 2018

Advanced Neural Network Applications参考文献 31被引用 17

一句话总结

ChamNet 提出了一种平台感知的神经架构自适应框架，通过快速、精确的预测器，在目标延迟和能效约束下优化现有高效模块。该方法结合基于高斯过程的贝叶斯优化与硬件特定的延迟查表，实现了最先进的精度表现——在移动 CPU 和 DSP 上 20ms 延迟下 ImageNet 的 top-1 准确率达 73.8%，同时将搜索时间缩短至分钟级，而非以往的 GPU 数周。

ABSTRACT

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).

研究动机与目标

解决在资源约束各异的多样化硬件平台上部署高效神经网络的挑战。
通过用预测建模替代昂贵的训练与测量过程，降低神经架构搜索的时间与计算成本。
通过智能计算资源分配，在不增加延迟或能效消耗的前提下提升模型精度。
通过最小化每种平台的搜索开销，实现紧凑模型在异构设备上的可扩展、大规模部署。

提出的方法

将平台感知的神经架构搜索建模为一个优化问题，利用精度与资源（延迟/能效）预测器。
采用基于高斯过程的贝叶斯优化框架，迭代采样高精度架构，同时最小化评估成本。
使用操作延迟查表（LUT）实现针对特定硬件平台的快速、精确延迟估计。
引入非平衡准蒙特卡洛采样，以提升精度与资源预测器的效率与鲁棒性。
根据硬件特性与特征图尺寸，通过重新分配 FLOPs 实现网络各阶段的计算资源优化。
构建一次性训练的预测器（精度、延迟、能效），在多个平台与约束条件下分摊搜索成本，将总成本从 O(m·n·k) 降低至 O(m+n)。

实验结果

研究问题

RQ1我们是否能在不使用强化学习或新型模块的前提下，在资源受限平台上实现 SOTA 精度？
RQ2平台感知的计算资源重分配对不同硬件平台上的模型精度与效率有何影响？
RQ3预测模型是否能显著缩短神经架构搜索的时间与成本，同时保持高精度？
RQ4FLOPs 在网络各阶段的分布对移动 CPU 与 DSP 上的推理速度与精度有何影响？
RQ5在精度、延迟与搜索效率方面，该方法与现有 NAS 与压缩技术相比表现如何？

主要发现

在移动 CPU 上，ChamNet 在 20ms 延迟下实现 73.8% 的 ImageNet top-1 准确率，相比 MobileNetV2 和 MnasNet 分别绝对提升 8.5% 和 6.6%。
在移动 DSP 上，ChamNet 在 20ms 延迟下实现 75.3% 的 top-1 准确率，相比 ResNet-152 和 MnasNet 分别绝对提升 9.3% 和 4.8%。
通过使用一次性训练的预测器，框架将搜索时间缩短至分钟级，避免了 MnasNet 所需的数百 GPU 小时。
ChamNet 通过将 FLOPs 从早期阶段重新分配至后期阶段，提升了 CPU 利用率，使在相似延迟下精度比 MobileNetV2 提高 2.1%，延迟降低 5%。
在搭载 Snapdragon 835 CPU 的三星 Galaxy S8 上，ChamNet 在 20ms 延迟下相比 MnasNet 实现 1.7% 的精度提升与 1.75 倍的加速。
该框架将总搜索成本从 O(m·n·k) 降低至 O(m+n)，使其在大规模异构部署中具备高度可扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。