[论文解读] Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
该论文引入优先路径蒸馏,以通过子网络从内部、动态更新的高性能架构学习来提升单-shot NAS,而无需外部教师,从而提高排序相关性和最终准确性。
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.
研究动机与目标
- 在权重共享的单-shot NAS 中,子网的训练不足的问题进行动机阐述并予以解决。
- 引入一个优先路径棋盘(板)来识别并动态更新高性能的架构候选者。
- 开发一个元网络,在训练期间为每个子网选择最匹配的优先路径。
- 将优先路径中的知识蒸馏给子网,以改善收敛性和最终架构质量。
- 展示排名相关性提升以及在 ImageNet 上的优越性能,并能迁移到对象检测任务。
提出的方法
- 将优先路径定义为在训练过程中具有更优验证性能的架构候选者。
- 维护一个包含 K 条路径的优先路径棋盘 B,通过选择性竞争更新(首选更高准确度和更低 FLOPs)。
- 引入一个元网络 M,以基于路径互补性预测给定子网的最佳优先路径。
- 用联合损失训练子网,该损失将真实标签的交叉熵与所选优先路径的蒸馏损失相结合,权重由路径的匹配分数决定。
- 迭代更新超网络权重、优先路径棋盘和元网络;训练结束后从棋盘中选择最佳路径作为最终架构。
实验结果
研究问题
- RQ1在没有外部教师的情况下,优先路径蒸馏是否能改进单-shot NAS 中子网的训练?
- RQ2相比随机或固定教师,元学习的路径匹配策略是否能更好地将子网与互补路径匹配?
- RQ3所提出的方法如何影响单-shot 权重与真实架构性能之间的排序相关性?
- RQ4与传统搜索方法相比,优先路径棋盘产生的最终架构有哪些?
主要发现
- 优先路径蒸馏在强基线和研究方法上取得改进,在移动 FLOPs 约束下实现更高的 ImageNet top-1 准确率。
- 元学习的路径匹配在 ImageNet 精度上相比随机路径匹配提供了可衡量的提升。
- 在单-shot 权重与真实性能之间的 Kendall 秩相关性在优先路径蒸馏后显著提升(例如在 subImageNet 上)。
- 所发现的架构在对齐的 FLOPs 预算下优于 MobileNetV3 和 EfficientNet 家族成员,并且能良好迁移到 COCO 目标检测任务。
- 使用一个较小的优先路径棋盘(K=10)并且仅使用有限的验证样本,在搜索成本显著低于某些进化或强化学习基础的 NAS 方法的前提下,取得有竞争力的结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。