QUICK REVIEW

[论文解读] ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients

Guihong Li, Yuedong Yang|arXiv (Cornell University)|Jan 26, 2023

Advanced Neural Network Applications被引用 19

一句话总结

ZiCo 提出了一种基于跨样本梯度的均值与方差的训练自由零-shot NAS 代理，在多个 NAS 基准测试中其与测试精度的相关性持续优于 #Params，并在大幅减少搜索时间的情况下实现具竞争力的结果。

ABSTRACT

Neural Architecture Search (NAS) is widely used to automatically obtain the neural network with the best performance among a large number of candidate architectures. To reduce the search time, zero-shot NAS aims at designing training-free proxies that can predict the test performance of a given architecture. However, as shown recently, none of the zero-shot proxies proposed to date can actually work consistently better than a naive proxy, namely, the number of network parameters (#Params). To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate and generalization capacity of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. We demonstrate that ZiCo works better than State-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101, NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., image classification/reconstruction and pixel-level prediction). Finally, we demonstrate that the optimal architectures found via ZiCo are as competitive as the ones found by one-shot and multi-shot NAS methods, but with much less search time. For example, ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code is available at https://github.com/SLDGroup/ZiCo.

研究动机与目标

在 NAS 中说明需要训练自由代理的必要性，并解决先前的零-shot 代理与 #Params 之间存在的不一致性问题。
理论分析跨样本梯度均值和方差如何影响收敛性与泛化。
将 ZiCo 开发为一个零-shot 代理，利用梯度统计信息在主流 NAS 基准上优于现有代理。
在 NAS 基准和 ImageNet 规模搜索中展示 ZiCo 的有效性，同时缩短搜索时间。

提出的方法

作者分析跨样本梯度的均值和标准差如何影响训练收敛性与泛化性，先从线性回归设定出发，扩展到 ReLU-MLP。
他们证明在跨样本的梯度均值的绝对值越大时收敛更快，而梯度方差越小则泛化性更好，并将其与 Gram 矩阵特征值相关联。
ZiCo 被定义为一个零-shot 代理，在初始参数下使用两个批次（N=2）计算每层梯度幅值的期望与方差之比的对数形式之和，且不需要训练。
ZiCo 指标对 CNNs 架构无关，仅依赖初始参数，确保零-shot 评估。
他们展示 ZiCo 在 NASBench101、NATS-Bench-SSS/TSS 以及 TransNASBench-101 上对测试精度的相关性高于其他零-shot 代理和 #Params。

实验结果

研究问题

RQ1梯度均值和梯度方差在训练样本上的表现是否可以作为一个理论基础的零-shot NAS 性能代理？
RQ2基于梯度统计的零-shot 代理是否在多样化的 NAS 拓扑和任务中持续优于朴素的 #Params 代理？
RQ3ZiCo 是否能在各种 FLOPs 预算下以最小的搜索成本预测具有竞争测试精度的架构？
RQ4在如 ImageNet 这样的大规模任务上，ZiCo 的表现相对于一次性与多次 NAS 方法如何？

主要发现

ZiCo 在 NASBench101 与 NATS-Bench 的多数据集上对测试精度的相关性优于现有代理（包括 #Params）。
在 ImageNet 的 450M–1000M FLOPs 预算下，基于 ZiCo 的零-shot NAS 与最先进的 NAS 方法相比仍具竞争力的 Top-1 精度，同时搜索成本明显较低（约 ~0.4 GPU 天）。
两次训练批次就足以以高可靠性计算 ZiCo，从而实现对候选架构的快速评估。
ZiCo 使 NAS 能够在一次性与多次 NAS 的权衡中找到具有竞争性 FLOPs-精度权衡的架构，同时需要的训练时间远少于其他方法。
经验消融表明，在 ZiCo 计算中的批次数越多并不会提升相关性，且批大小为 64 能稳定度量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。