QUICK REVIEW

[論文レビュー] One-Shot Neural Architecture Search Through A Posteriori Distribution Guided Sampling.

Yizhou Zhou, Xiaoyan Sun|arXiv (Cornell University)|Jun 23, 2019

Domain Adaptation and Few-Shot Learning被引用数 3

ひとこと要約

本稿では、アーキテクチャと重みの推定された共同事後分布に従ってサブネットワークをサンプリングすることにより、効率性と精度を向上させるワンショットニューラルアーキテクチャサーチ手法を提案する。変分推論とハイブリッドネットワーク表現を用いてこの分布をモデル化することで、サブネットワークのサンプリング回数を桁違いに削減し、CIFAR-10、CIFAR-100、ImageNetにおいて最先端の性能を達成した。探索は20倍高速化され、従来の手法よりも高い精度を実現した。

ABSTRACT

The emergence of one-shot approaches has greatly advanced the research on neural architecture search (NAS). Recent approaches train an over-parameterized super-network (one-shot model) and then sample and evaluate a number of sub-networks, which inherit weights from the one-shot model. The overall searching cost is significantly reduced as training is avoided for sub-networks. However, the network sampling process is casually treated and the inherited weights from an independently trained super-network perform sub-optimally for sub-networks. In this paper, we propose a novel one-shot NAS scheme to address the above issues. The key innovation is to explicitly estimate the joint a posteriori distribution over network architecture and weights, and sample networks for evaluation according to it. This brings two benefits. First, network sampling under the guidance of a posteriori probability is more efficient than conventional random or uniform sampling. Second, the network architecture and its weights are sampled as a pair to alleviate the sub-optimal weights problem. Note that estimating the joint a posteriori distribution is not a trivial problem. By adopting variational methods and introducing a hybrid network representation, we convert the distribution approximation problem into an end-to-end neural network training problem which is neatly approached by variational dropout. As a result, the proposed method reduces the number of sampled sub-networks by orders of magnitude. We validate our method on the fundamental image classification task. Results on Cifar-10, Cifar-100 and ImageNet show that our method strikes the best trade-off between precision and speed among NAS methods. On Cifar-10, we speed up the searching process by 20x and achieve a higher precision than the best network found by existing NAS methods.

研究の動機と目的

ランダムまたは一様なサブネットワークサンプリングを用いる従来のワンショットNAS手法の非効率性と、部分最適な性能の問題に対処すること。
スーパーネットワークから継承された重みが個々のサブネットワークに対して部分最適であるという問題を軽減すること。
学習された事後分布に基づいて、アーキテクチャと重みを同時にサンプリングする手法を開発し、探索の効率性と精度を向上させること。
最終的なモデルの性能を維持または向上させながら、NASにおけるサブネットワーク評価回数を削減すること。

提案手法

本手法は、変分推論を用いて、ニューラルアーキテクチャと重みの共同事後分布を推定する。
共同分布の有効なパラメータ化を可能にするために、ハイブリッドネットワーク表現を導入する。
分布の近似問題を、変分ドロップアウトを用いたエンドツーエンドの学習問題に再定式化する。
推定された事後確率に従ってサブネットワークをサンプリングすることで、探索中にアーキテクチャと重みが同時に最適化されるように保証する。
全プロセスが一括でエンドツーエンドに学習可能であり、効率的で微分可能な探索が可能となる。

実験結果

リサーチクエスチョン

RQ1アーキテクチャと重みの事後分布は、ワンショットNASにおけるサンプリング効率を向上させ得るか？
RQ2事後分布に従うサンプリングは、ランダムまたは一様なサンプリングと比較して、探索の効率性と精度において優れているか？
RQ3アーキテクチャと重みの同時サンプリングは、継承されたスーパーネットワーク重みの部分最適性の問題を軽減できるか？
RQ4探索性能を維持または向上させながら、サブネットワーク評価回数をどの程度まで削減できるか？

主な発見

提案手法は、従来のワンショットNASと比較して、サブネットワークのサンプル数を桁違いに削減した。
CIFAR-10では、既存のNAS手法が発見した最良のネットワークよりも高いトップ-1精度を達成した。
CIFAR-10では探索プロセスが20倍高速化されたが、優れた性能を維持した。
CIFAR-10、CIFAR-100、ImageNetにおいて、探索速度と精度の最先端のトレードオフを達成した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。