[论文解读] Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
tldr: BONAS 将 Graph Convolutional Network embeddings 与 Bayesian sigmoid regression 相结合,以引导基于贝叶斯优化的搜索,同时使用权重共享来高效评估一批有前景的架构,从而提升基于样本的 NAS 的可靠性和速度。
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously. Specifically, we apply Graph Convolutional Network predictor as a surrogate model for Bayesian Optimization to select multiple related candidate models in each iteration. We then apply weight-sharing to train multiple candidate models simultaneously. This approach not only accelerates the traditional sample-based approach significantly, but also keeps its reliability. This is because weight-sharing among related architectures are more reliable than those in the one-shot approach. Extensive experiments are conducted to verify the effectiveness of our method over many competing algorithms.
研究动机与目标
- Motivate efficient and reliable neural architecture search (NAS) by improving sample-based NAS efficiency.
- Develop a surrogate model that naturally handles graph-structured architectures without handcrafted kernels.
- Accelerate evaluation by weight-sharing a small subset of high-potential architectures.
- Demonstrate BONAS gains across closed-domain NAS benchmarks and open-domain search spaces.
- Show transferability and robustness of BONAS across architectures and datasets.
提出的方法
- Encode neural architectures as graphs and derive global graph embeddings via a Graph Convolutional Network (GCN).
- Replace Gaussian process surrogates with a Bayesian sigmoid regression (BSR) over GCN embeddings to obtain predictive mean and variance for Bayesian optimization (BO).
- Use an exponentially weighted loss to train the surrogate, emphasizing high-accuracy architectures.
- In the query phase, form a small super-network by weight-sharing a batch of top-k BO-selected architectures and train them together, reinitializing weights to ensure fair evaluation.
- Select candidates from a pool using UCB acquisition with mean/variance provided by the GCN+BSR surrogate.
- Iteratively update the surrogate with newly evaluated architectures and refine embeddings.
实验结果
研究问题
- RQ1Can a graph-based embedding plus Bayesian surrogate improve BO-based NAS performance without handcrafted kernels?
- RQ2Does weight-sharing over a small, high-potential subset of architectures yield reliable and faster evaluations than full training or large-scale weight-sharing?
- RQ3How does BONAS perform on standard NAS benchmarks (NAS-Bench-101/ NAS-Bench-201) and open-domain search spaces (e.g., NASNet) compared to state-of-the-art methods?
- RQ4Is BONAS transferable to other model families (e.g., LSTM cells) and robust across embedding sizes?
主要发现
- GCN-based predictors achieve higher correlation with true architecture performance than MLP/LSTM/meta-NN baselines on NAS-Bench-101/201 and LSTM-12K.
- BONAS consistently outperforms competing baselines in closed-domain NAS benchmarks.
- In open-domain NAS (NASNet search space), BONAS achieves competitive top-1 error on CIFAR-10 while requiring substantially fewer GPU days than some baselines.
- BONAS enables exploring and evaluating thousands of architectures efficiently via small-batch weight-sharing (k around 100) in the super-network phase.
- Transferring BONAS-discovered architectures from CIFAR-10 to ImageNet yields competitive results, with BONAS-derived cells achieving strong top-1/top-5 metrics under mobile constraints.
- Ablations show the GCN+BSR surrogate with weighted loss and weight-sharing query phase are beneficial to performance and efficiency.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。