[论文解读] sharpDARTS: Faster and More Accurate Differentiable Architecture Search
sharpDARTS 通过引入 SharpSepConv、Cosine Power Annealing 学习率调度,以及 Max-W 正则化,在可微分架构搜索方面实现改进,在 CIFAR-10、CIFAR-10.1 上获得更快的搜索速度和更高的准确率,并在 ImageNet 上取得具竞争力的结果。
Neural Architecture Search (NAS) has been a source of dramatic improvements in neural network design, with recent results meeting or exceeding the performance of hand-tuned architectures. However, our understanding of how to represent the search space for neural net architectures and how to search that space efficiently are both still in their infancy. We have performed an in-depth analysis to identify limitations in a widely used search space and a recent architecture search method, Differentiable Architecture Search (DARTS). These findings led us to introduce novel network blocks with a more general, balanced, and consistent design; a better-optimized Cosine Power Annealing learning rate schedule; and other improvements. Our resulting sharpDARTS search is 50% faster with a 20-30% relative improvement in final model error on CIFAR-10 when compared to DARTS. Our best single model run has 1.93% (1.98+/-0.07) validation error on CIFAR-10 and 5.5% error (5.8+/-0.3) on the recently released CIFAR-10.1 test set. To our knowledge, both are state of the art for models of similar size. This model also generalizes competitively to ImageNet at 25.1% top-1 (7.8% top-5) error. We found improvements for existing search spaces but does DARTS generalize to new domains? We propose Differentiable Hyperparameter Grid Search and the HyperCuboid search space, which are representations designed to leverage DARTS for more general parameter optimization. Here we find that DARTS fails to generalize when compared against a human's one shot choice of models. We look back to the DARTS and sharpDARTS search spaces to understand why, and an ablation study reveals an unusual generalization gap. We finally propose Max-W regularization to solve this problem, which proves significantly better than the handmade design. Code will be made available.
研究动机与目标
- 识别现有 NAS 搜索空间和 DARTS 方法的局限性。
- 开发更平衡、高效的搜索空间和训练方案。
- 提升移动端尺度架构的一般化能力和搜索效率。
提出的方法
- 提出带有平衡深度和瓶颈以使操作数相等的 SharpSepConv 模块。
- 引入 Cosine Power Annealing 学习率调度,在训练过程中保持有效学习率。
- 定义 Differentiable Hyperparameter Grid Search 与 HyperCuboid 搜索空间,用于评估离散选择。
- 进行消融实验以识别 DARTS 的偏差,并引入 Max-W 正则化以缓解偏差。
实验结果
研究问题
- RQ1DARTS 搜索空间是否存在限制泛化到新领域的偏差?
- RQ2SharpSepConv 和改进的训练方案是否能够在 CIFAR-10/10.1 及 ImageNet 上实现更快搜索并获得更高最终准确率?
- RQ3Max-W 正则化是否通过缓解对低容量原语的偏见来改善架构搜索?
主要发现
- SharpSepConv 和 sharpDARTS 在 CIFAR-10 和 CIFAR-10.1 上实现了移动端规模的最先进结果,并显著缩短了搜索时间。
- Cosine Power Annealing 保持了更优的学习率,提升了训练动态,相较于标准的 Cosine Annealing 表现更好。
- Max-W 正则化减少了对高梯度、小原语的偏见,使模型变大但更准确。
- Differentiable Hyperparameter Grid Search 与 HyperCuboid 空间揭示了 DARTS 在不同空间中的泛化差距。
- 手工设计和 Max-W 正则化的方法在某些设置下甚至优于标量 DARTS,凸显了搜索空间与优化中的偏差。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。