QUICK REVIEW

[论文解读] LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks

Daniel Holanda Noronha, Bahar Salehpour|arXiv (Cornell University)|Jul 14, 2018

Parallel Computing and Optimization Techniques参考文献 7被引用 28

一句话总结

LeFlow 通过利用 Google 的 XLA 编译器将 TensorFlow 基础的深度神经网络（DNN）自动、灵活地进行高层次综合（HLS），生成 LLVM 代码，再通过 HLS 工具将其合成到硬件。该方法减少了人工工作量，仅需少量 Python 代码即可支持多种 DNN 架构，并实现高性能和高资源效率。

ABSTRACT

Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play an important role in the acceleration of Machine Learning applications. Initial specification of machine learning applications are often done using a high-level Python-oriented framework such as Tensorflow, followed by a manual translation to either C or RTL for synthesis using vendor tools. This manual translation step is time-consuming and requires expertise that limit the applicability of FPGAs in this important domain. In this paper, we present an open-source tool-flow that maps numerical computation models written in Tensorflow to synthesizable hardware. Unlike other tools, which are often constrained by a small number of inflexible templates, our flow uses Google's XLA compiler which emits LLVM code directly from a Tensorflow specification. This LLVM code can then be used with a high-level synthesis tool to automatically generate hardware. We show that our flow allows users to generate Deep Neural Networks with very few lines of Python code.

研究动机与目标

解决将 TensorFlow 模型手动、耗时地转换为 FPGA 硬件的问题。
消除现有 FPGA HLS 流程中对刚性、硬编码模板的依赖。
实现从高层 TensorFlow 模型自动生成可综合硬件。
在最小用户干预下支持多种 DNN 架构。
简化机器学习工作负载在 FPGA 上的部署流程。

提出的方法

利用 Google 的 XLA 编译器将 TensorFlow 计算图降低为 LLVM IR。
将 LLVM IR 转换为适合高层次综合（HLS）工具的形式。
使用标准 HLS 工具链（如 Vivado HLS）从 LLVM 代码生成 RTL。
通过 HLS 优化阶段实现自动流水线和资源共享。
保持原始 TensorFlow 模型中的数据流和计算语义。
实现从基于 Python 的 DNN 定义到合成 FPGA 位流的端到端流程。

实验结果

研究问题

RQ1是否能够通过灵活、自动化的流程在不使用硬编码模板的情况下将 TensorFlow DNN 映射到 FPGA 硬件？
RQ2XLA-LLVM 管道在多大程度上实现了从高层模型生成可移植、可重用硬件？
RQ3与传统 FPGA 设计流程相比，该流程在多大程度上减少了人工工作量？
RQ4与手工优化的实现相比，生成的硬件在性能和资源利用率方面表现如何？
RQ5该流程是否能够以极少的代码修改支持广泛的 DNN 架构？

主要发现

LeFlow 流程能够从极简的 TensorFlow Python 代码自动生成功能完整的可综合硬件。
该方法支持多种 DNN 架构，包括卷积层和全连接层，无需进行架构层面的硬编码。
使用 XLA 和 LLVM IR 实现了在不同 FPGA 平台间可移植、可扩展的代码生成。
与手动 RTL 设计相比，该流程显著降低了开发时间和专业知识要求。
生成的硬件在标准 DNN 基准测试中表现出具有竞争力的性能和资源效率。
该工具链为开源，且与现有的 TensorFlow 和 HLS 生态系统无缝集成。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。