QUICK REVIEW

[論文レビュー] Rethinking Architecture Selection in Differentiable NAS

Ruochen Wang, Minhao Cheng|arXiv (Cornell University)|Aug 10, 2021

Advanced Neural Network Applications参考文献 30被引用数 33

ひとこと要約

本論文は、微分可能NASにおける伝統的なマグニチュードベースの選択（α）が誤解を招く可能性があると主張し、超ネットの性能への各演算の寄与を評価する摂動ベースのアーキテクチャ選択（PT）を導入する。これにより一貫してより良いアーキテクチャを得られ、DARTSの堅牢性問題を緩和する。

ABSTRACT

Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms. At the end of the search phase, the operations with the largest architecture parameters will be selected to form the final architecture, with the implicit assumption that the values of architecture parameters reflect the operation strength. While much has been discussed about the supernet's optimization, the architecture selection process has received little attention. We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet's performance. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. Furthermore, we find that several failure modes of DARTS can be greatly alleviated with the proposed selection method, indicating that much of the poor generalization observed in DARTS can be attributed to the failure of magnitude-based architecture selection rather than entirely the optimization of its supernet.

研究の動機と目的

Differentiable NASにおいて、アーキテクチャパラメータの大きさが演算の強さを反映しているかを評価する。
マグニチュードベースの選択の失敗モードを分析する（例：スキップ接続の支配）。
超ネットの性能に対する各演算の影響を測る摂動ベースのアーキテクチャ選択（PT）を提案・評価する。
PTの有効性をDARTS、SDARTS、 SGAS、NAS-Bench-201で示す。

提案手法

収束時の離散化精度を演算の強さとして定義し、それがαと齟齬を生むことを示す。
摂動ベースの強さの指標を提案する：エッジ上の各演算を除去し、検証精度への影響を測定する。
Algorithm 1（摂動ベースのアーキテクチャ選択）を開発し、エッジを反復してACCの低下で最良の演算を選択し、離散化してファインチューニングする。
計算負荷を減らすために、各演算を除去してACCの低下を観察することで演算の重要性を測定することも可能。
DARTS、SDARTS(rs)、SGASの事前学習済み超ネットに摂動ベースの選択を適用して最終アーキテクチャを導出する。
αなしの学習（均一なα）とPTを組み合わせると、従来のDARTSの性能に匹敵する、またはそれを上回ることを示す。

実験結果

リサーチクエスチョン

RQ1アーキテクチャパラメータαの大きさは、各演算の寄与を信頼性高く示しているか？
RQ2摂動ベースの基準は、強力な演算をよりよく識別し、微分可能NASの変種全体でアーキテクチャ選択を安定化できるか？
RQ3PTはDARTSとその派生で観察される堅牢性の問題を、複数の探索空間でどう影響するか？
RQ4CIFAR-10とNAS-Bench-201におけるPTの適用による性能影響は、従来のマグニチュードベースの選択と比べてどうか？

主な発見

摂動ベースの選択は、DARTS、SDARTS(rs)、SGASのいずれでもマグニチュードベースの選択より一貫して優れたアーキテクチャを生み出す。
DARTS+PTはCIFAR-10のテスト誤差を3.00%（DARTS）から平均2.61%、最良2.48%へ改善。
SDARTS-RS+PTはCIFAR-10で平均2.54%、最良2.44%へ改善。
NAS-Bench-201では、DARTSのベースラインがテスト誤差45.7%と報告される一方、DARTS+PTは平均11.89%、PT（固定α）で6.20%を達成。
DARTS+PTは、Zela et al. 2020のS2、S4のように、DARTSが劣化する空間でも意味のあるアーキテクチャを抽出できる。
αを均一重みに固定しつつPTを用いると、いくつかの空間で競争力のあるまたはそれ以上の結果を示し、PTと組み合わせることでαが不要になる可能性を示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。