[論文レビュー] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS は、ターゲットタスクとハードウェア上でパスの二値化を用いてニューラルネットワークのアーキテクチャを直接学習し、メモリを削減する。遅延制約下で CIFAR-10 および ImageNet において最先端の結果を達成する。
Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. $10^4$ GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6$ imes$ fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2$ imes$ faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.
研究の動機と目的
- Directly learn CNN architectures on large-scale targets (e.g., ImageNet) without proxy tasks.
- Enable architecture search across diverse hardware platforms (GPU, CPU, mobile).
- Increase search efficiency to regular-training levels via path-level pruning and binarization.
- Eliminate block-repetition restrictions to expand architectural diversity.
- Provide hardware-aware architectural insights for efficient inference.
提案手法
- Construct an over-parameterized network containing all candidate paths for each mixed operation.
- Binarize architecture parameters to activate only one path at runtime, reducing memory usage to regular training levels.
- Train weight and architecture parameters with alternating updates; derive compact architecture by pruning paths with lowest weights.
- Model hardware latency as a continuous differentiable loss (latency regularization) to optimize for latency alongside accuracy.
- Provide a REINFORCE-based alternative for training binarized paths when needed.
- For non-differentiable hardware metrics, use a latency prediction model to guide architecture search on mobile hardware.]
- research_questions: [
実験結果
リサーチクエスチョン
- RQ1Can NAS be performed directly on large-scale tasks (e.g., ImageNet) and on target hardware without proxy tasks?
- RQ2Does path binarization enable memory-efficient, gradient-based NAS at large scales?
- RQ3How can latency be incorporated as a differentiable objective to produce hardware-aware architectures?
- RQ4Do architecture searches on target hardware yield architectures with superior accuracy/latency trade-offs compared to proxy-based methods?
- RQ5What are the hardware-specific architectural patterns that emerge when optimizing for different platforms (GPU, CPU, mobile)?
主な発見
| モデル | パラメータ | テスト誤差 (%) |
|---|---|---|
| AmoebaNet-B + c/o | 34.9M | 2.13 |
| Proxyless-R + c/o | 5.8M | 2.30 |
| Proxyless-G + c/o | 5.7M | 2.08 |
- On CIFAR-10, ProxylessNAS achieves 2.08% test error with 5.7M parameters, outperforming AmoebaNet-B while using 6× fewer parameters.
- On ImageNet, Proxyless-G achieves 75.1% top-1 accuracy (3.1% higher than MobileNetV2) and is 1.2× faster in measured GPU latency.
- Proxyless-G on mobile achieves 74.6% top-1 accuracy with 78 ms latency, while reducing search cost to 200 GPU-hours (200× less than MnasNet).
- Architectures specialized for hardware (GPU/CPU/Mobile) show distinct characteristics; GPU favors shallower, wider models with larger MBConv operations, while CPU favors deeper, narrower models.
- Latency regularization is critical; without it, latency-optimized models underperform in accuracy, illustrating the need for hardware-aware NAS.
- ProxylessNAS demonstrates state-of-the-art results on CIFAR-10 and ImageNet under latency constraints and reveals insights into efficient CNN design for different hardware.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。