[论文解读] ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network
本工作介绍 ZynqNet,这是一个为嵌入式应用设计的 FPGA 加速 CNN,在基于 Zynq 的平台上实现,作为 ETH Zürich 的硕士论文(2016 年)。
Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation. ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed Netscope CNN Analyzer have enabled a CNN with 84.5% top-5 accuracy at a computational complexity of only 530 million multiplyaccumulate operations. The topology is highly regular and consists exclusively of convolutional layers, ReLU nonlinearities and one global pooling layer. The CNN fits ideally onto the FPGA accelerator. The ZynqNet FPGA Accelerator allows an efficient evaluation of ZynqNet CNN. It accelerates the full network based on a nested-loop algorithm which minimizes the number of arithmetic operations and memory accesses. The FPGA accelerator has been synthesized using High-Level Synthesis for the Xilinx Zynq XC-7Z045, and reaches a clock frequency of 200MHz with a device utilization of 80% to 90 %.
研究动机与目标
- 推动在嵌入式 CNN 工作负载中使用 FPGA 加速的动机。
- 开发一个适合 Zynq 设备 FPGA 资源的 CNN 架构(ZynqNet)。
- 评估在嵌入式系统上基于 FPGA 的 CNN 推理的可行性、实现考量与潜在利益。
提出的方法
- 提出并实现一个为 Zynq 设备量身定制的 FPGA 加速 CNN 架构。
- 将 CNN 计算映射到 FPGA 资源以利用并行性和低延迟。
- 评估面向单板计算机或 SoC 平台的嵌入式部署设计考量。
- 讨论与资源、能效和在嵌入式环境中的性能相关的设计权衡。
实验结果
研究问题
- RQ1在嵌入式环境中,CNN 推理是否可以在 FPGA 硬件上得到有效加速?
- RQ2在基于 Zynq 的 FPGA 平台上实现 CNN 时,资源、性能与能源的权衡有哪些?
- RQ3哪些设计选择能够在嵌入式系统上实现可行的实时或近实时推理?
主要发现
- 证明在 FPGA 硬件上为嵌入式应用部署 CNN 的可行性。
- 突出了在 Zynq 平台上实现基于 FPGA 的 CNN 的设计与实现考量。
- 讨论在嵌入式环境中资源利用及与加速和延迟相关的潜在收益。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。