QUICK REVIEW

[论文解读] On Neural Architecture Search for Resource-Constrained Hardware Platforms

Qing Lu, Weiwen Jiang|arXiv (Cornell University)|Oct 31, 2019

Neural Networks and Applications参考文献 25被引用 59

一句话总结

本论文提出一种硬件-软件协同设计的 NAS 框架，联合搜索神经架构、量化和 FPGA 硬件映射，以在严格资源约束下满足要求，在硬件限制下相比单独搜索具有更高的精度。

ABSTRACT

In the recent past, the success of Neural Architecture Search (NAS) has enabled researchers to broadly explore the design space using learning-based methods. Apart from finding better neural network architectures, the idea of automation has also inspired to improve their implementations on hardware. While some practices of hardware machine-learning automation have achieved remarkable performance, the traditional design concept is still followed: a network architecture is first structured with excellent test accuracy, and then compressed and optimized to fit into a target platform. Such a design flow will easily lead to inferior local-optimal solutions. To address this problem, we propose a new framework to jointly explore the space of neural architecture, hardware implementation, and quantization. Our objective is to find a quantized architecture with the highest accuracy that is implementable on given hardware specifications. We employ FPGAs to implement and test our designs with limited loop-up tables (LUTs) and required throughput. Compared to the separate design/searching methods, our framework has demonstrated much better performance under strict specifications and generated designs of higher accuracy by 18\% to 68\% in the task of classifying CIFAR10 images. With 30,000 LUTs, a light-weight design is found to achieve 82.98\% accuracy and 1293 images/second throughput, compared to which, under the same constraints, the traditional method even fails to find a valid solution.

研究动机与目标

激发在资源约束下联合优化架构与硬件的 NAS 需求。
提出一个框架，联合搜索神经架构、量化方案和 FPGA 硬件映射。
证明联合搜索在硬件限制下比传统的分离方法具有更高的精度。

提出的方法

使用强化学习控制器探索架构和量化空间。
引入具有动态规划前沿剪枝的硬件空间搜索以满足 LUT 与吞吐量约束。
将量化建模为激活的无符号定点和权重的有符号定点，具备可训练的位宽。
采用两阶段评估：快速硬件可行性检查，若可行则进行训练/验证。
在 FPGA（Altera Cyclone IV）上实现端到端 CNN 加速器设计，时钟为 100 MHz。

实验结果

研究问题

RQ1联合探索架构、量化和硬件映射是否能在固定硬件约束下产生可行设计，且其性能超越分离的 NAS 与量化搜索？
RQ2量化与硬件约束如何相互作用，影响 CIFAR-10 上的可达到精度？
RQ3从框架层面能获得哪些好处（如准确性与硬件指标的帕累托前沿）来自协同设计 NAS？

主要发现

联合架构-量化-硬件搜索在资源约束下比分离搜索方法在 CIFAR-10 实验中具有更高的精度。
在 LUT 和吞吐量约束下，30000 LUT 的设计实现了 82.98% 的准确率和 1293 帧/秒。
在 10 万 LUT 以下的若干设计达到接近 90% 的准确率（如无量化时 89.71%，在某些情况下经过量化可达 90.30%）。
仅量化的搜索在吞吐量要求严格时会显著降低准确性，而联合搜索能够恢复稳健的性能。
该框架使用基于动态规划的前沿方法来剪裁硬件空间探索，使跨层的可扩展搜索成为可能。
最佳的联合设计在与某些基线架构相比时，显示出具有竞争力的准确性并显著降低了硬件资源需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。