Skip to main content
QUICK REVIEW

[论文解读] On Neural Architecture Search for Resource-Constrained Hardware Platforms

Qing Lu, Weiwen Jiang|arXiv (Cornell University)|Oct 31, 2019
Neural Networks and Applications参考文献 25被引用 59
一句话总结

本论文提出一种硬件-软件协同设计的 NAS 框架,联合搜索神经架构、量化和 FPGA 硬件映射,以在严格资源约束下满足要求,在硬件限制下相比单独搜索具有更高的精度。

ABSTRACT

In the recent past, the success of Neural Architecture Search (NAS) has enabled researchers to broadly explore the design space using learning-based methods. Apart from finding better neural network architectures, the idea of automation has also inspired to improve their implementations on hardware. While some practices of hardware machine-learning automation have achieved remarkable performance, the traditional design concept is still followed: a network architecture is first structured with excellent test accuracy, and then compressed and optimized to fit into a target platform. Such a design flow will easily lead to inferior local-optimal solutions. To address this problem, we propose a new framework to jointly explore the space of neural architecture, hardware implementation, and quantization. Our objective is to find a quantized architecture with the highest accuracy that is implementable on given hardware specifications. We employ FPGAs to implement and test our designs with limited loop-up tables (LUTs) and required throughput. Compared to the separate design/searching methods, our framework has demonstrated much better performance under strict specifications and generated designs of higher accuracy by 18\% to 68\% in the task of classifying CIFAR10 images. With 30,000 LUTs, a light-weight design is found to achieve 82.98\% accuracy and 1293 images/second throughput, compared to which, under the same constraints, the traditional method even fails to find a valid solution.

研究动机与目标

  • 激发在资源约束下联合优化架构与硬件的 NAS 需求。
  • 提出一个框架,联合搜索神经架构、量化方案和 FPGA 硬件映射。
  • 证明联合搜索在硬件限制下比传统的分离方法具有更高的精度。

提出的方法

  • 使用强化学习控制器探索架构和量化空间。
  • 引入具有动态规划前沿剪枝的硬件空间搜索以满足 LUT 与吞吐量约束。
  • 将量化建模为激活的无符号定点和权重的有符号定点,具备可训练的位宽。
  • 采用两阶段评估:快速硬件可行性检查,若可行则进行训练/验证。
  • 在 FPGA(Altera Cyclone IV)上实现端到端 CNN 加速器设计,时钟为 100 MHz。

实验结果

研究问题

  • RQ1联合探索架构、量化和硬件映射是否能在固定硬件约束下产生可行设计,且其性能超越分离的 NAS 与量化搜索?
  • RQ2量化与硬件约束如何相互作用,影响 CIFAR-10 上的可达到精度?
  • RQ3从框架层面能获得哪些好处(如准确性与硬件指标的帕累托前沿)来自协同设计 NAS?

主要发现

  • 联合架构-量化-硬件 搜索在资源约束下比分离搜索方法在 CIFAR-10 实验中具有更高的精度。
  • 在 LUT 和吞吐量约束下,30000 LUT 的设计实现了 82.98% 的准确率和 1293 帧/秒。
  • 在 10 万 LUT 以下的若干设计达到接近 90% 的准确率(如无量化时 89.71%,在某些情况下经过量化可达 90.30%)。
  • 仅量化的搜索在吞吐量要求严格时会显著降低准确性,而联合搜索能够恢复稳健的性能。
  • 该框架使用基于动态规划的前沿方法来剪裁硬件空间探索,使跨层的可扩展搜索成为可能。
  • 最佳的联合设计在与某些基线架构相比时,显示出具有竞争力的准确性并显著降低了硬件资源需求。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。