QUICK REVIEW

[論文レビュー] Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Houwen Peng, Hao Du|arXiv (Cornell University)|Oct 29, 2020

Advanced Neural Network Applications参考文献 42被引用数 39

ひとこと要約

本論文は、外部の教師なしで内部の動的に更新される最高性能アーキテクチャからサブネットワークが学ぶようにすることで、ワンショット NAS を強化する優先パス蒸留を提案し、順位相関と最終精度を向上させる。

ABSTRACT

One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.

研究の動機と目的

重み共有型のワンショット NAS におけるサブネットワークの訓練不足を動機づけて対処する。
高性能なアーキテクチャ候補を識別し、ダイナミックに更新する優先パスボードを導入する。
訓練中に各サブネットワークに最も適合する優先パスを選択するメタネットワークを開発する。
優先パスからサブネットワークへ知識を蒸留して収束性と最終アーキテクチャ品質を向上させる。
ImageNet でのランキング相関の向上と優れた性能を実証し、物体検出への転移を示す。

提案手法

訓練中に検証性能が優れているアーキテクチャ候補として優先パスを定義する。
高性能パスを K 本含む優先パスボード B を維持し、選択的競合によって更新する（精度が高く、FLOPs が少ないものを優先）。
パスの補完性に基づいて、与えられたサブネットワークに最適な優先パスを予測するメタネットワーク M を導入する。
選択された優先パスからの蒸留 loss を、パスの適合スコアで重み付けした真のラベルとのクロスエントロピーを組み合わせたジョイント損失でサブネットワークを訓練する。
ハイパーネットワークの重み、優先パスボード、メタネットワークを反復的に更新する。訓練後、ボードから最良のパスを選択して最終アーキテクチャとする。

実験結果

リサーチクエスチョン

RQ1外部の教師なしで、優先パス蒸留はワンショット NAS のサブネットワークの訓練を改善できるか？
RQ2メタ学習を用いたパス選択戦略は、ランダムまたは固定された教師よりもサブネットワークと補完的なパスの適合をより良くするか？
RQ3提案手法は、一度の学習重みと真のアーキテクチャ性能とのランキング相関にどう影響するか？
RQ4従来の探索手法と比較して、優先パスボードからどのような最終アーキテクチャが生まれるか？

主な発見

優先パス蒸留は、強力なベースラインや REsearch 手法を上回る改善をもたらし、mobile-FLOPs 制約下で ImageNet のトップ-1 精度を向上させる。
メタ学習によるパスマッチングは、ImageNet の精度でランダムなパスマッチングより測定可能な向上を提供する。
Kendall ランク相関は、優先パス蒸留により著しく向上する（例：subImageNet で）。
発見されたアーキテクチャは、適合した FLOPs 枠内で MobileNetV3 や EfficientNet ファミリのメンバーを上回り、COCO の物体検出タスクへの転移性も高い。
小規模な優先パスボード（K=10）と限られた検証サンプルを用いると、進化的または RL ベースの NAS 手法よりもはるかに低い探索コストで競争力のある結果を達成する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。