QUICK REVIEW

[论文解读] FermiNets: Learning generative machines to generate efficient neural networks via generative synthesis

Alexander Wong, Mohammad Javad Shafiee|arXiv (Cornell University)|Sep 17, 2018

Ferroelectric and Negative Capacitance Devices参考文献 18被引用 57

一句话总结

GenSynth 训练一个生成器–审问者对，以自动创建用于边缘场景的高效神经网络（FermiNets），在分类、分割和检测任务中实现显著提高效率、MACs 和能效的改进。

ABSTRACT

The tremendous potential exhibited by deep learning is often offset by architectural and computational complexity, making widespread deployment a challenge for edge scenarios such as mobile and other consumer devices. To tackle this challenge, we explore the following idea: Can we learn generative machines to automatically generate deep neural networks with efficient network architectures? In this study, we introduce the idea of generative synthesis, which is premised on the intricate interplay between a generator-inquisitor pair that work in tandem to garner insights and learn to generate highly efficient deep neural networks that best satisfies operational requirements. What is most interesting is that, once a generator has been learned through generative synthesis, it can be used to generate not just one but a large variety of different, unique highly efficient deep neural networks that satisfy operational requirements. Experimental results for image classification, semantic segmentation, and object detection tasks illustrate the efficacy of generative synthesis in producing generators that automatically generate highly efficient deep neural networks (which we nickname FermiNets) with higher model efficiency and lower computational costs (reaching >10x more efficient and fewer multiply-accumulate operations than several tested state-of-the-art networks), as well as higher energy efficiency (reaching >4x improvements in image inferences per joule consumed on a Nvidia Tegra X2 mobile processor). As such, generative synthesis can be a powerful, generalized approach for accelerating and improving the building of deep neural networks for on-device edge scenarios.

研究动机与目标

说明在边缘设备上实现高效的设备端神经网络的需求。
将生成性综合引入为生成器–审问者框架，以自动生成高效网络。
证明学习得到的生成器在运行约束下可以生成多样且高效的网络（FermiNets）。
展示在分类、分割和检测任务中的实证提升。

提出的方法

定义一个生成器 G(s;θG)，它从种子 s 生成网络 Ns。
引入审问者 I(·;θI)，输出参数更新 ΔθG 来引导 G。
建立约束优化：G = argmax_G U(G(s))，在对所有种子 s ∈ S 满足 1r(G(s))=1 的约束下。
迭代地生成、用输入刺激 X 探测、观察响应 Y，并更新 I 与 G 以在满足要求的同时提高 U。
证明学习得到的 G 能为不同的种子 s 生成多种高质量的网络 Ns。
使用信息密度、MACs 和 NetScore 进行评估，以与最先进的高效网络进行比较。

实验结果

研究问题

RQ1通过生成性综合学习得到的生成器是否能够产生多样且独特的神经网络，满足预定义的运行要求？
RQ2由学习生成器生成的 FermiNets 是否比当代边缘友好架构获得更高的效率和 NetScore？
RQ3在分类、分割和检测等任务中，信息密度、MAC 降幅和能效是否存在持续的提升？

主要发现

在 CIFAR-10 上，FermiNets（A–D）的 top-1 准确率与 NASNet-L2C(S) 相近或略高（A、B、C 分别高出约 1.4%、0.4%、0.01%）。
FermiNets 的信息密度比 MobileNet、ShuffleNet 和 NASNet-L2C(S 高出 >12×。
FermiNets 的 MAC 运算量比 NASNet-L2C(S) 低 >2.7×、>3.6×、>4.5×、约 5×，分别对应 A、B、C、D。
FermiNets 的 NetScore 分别比 MobileNet、ShuffleNet、NASNet-L2C(S) 高出 >11.8、>15、>15.6、>17.5 分，对应 A–D。
在语义分割中，FermiNet-SS 取得 90.4% 的准确率（RefineNet 为 90.3%），信息密度高出 >12×，MACs 约少 2.6×，NetScore 提升约 15 分。
在目标检测中，FermiNet-OD 实现 61.0% mAP（DetectNet 为 61.8%），信息密度高出 >10×，MACs 少 >11×，NetScore 高出 >21 分。
在 Nvidia Tegra X2 上的能效显示，FermiNet-OD 的每焦耳推断次数比 DetectNet 高出 >4×，凸显边缘设备性能优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。