[论文解读] Stabilizing Differentiable Architecture Search via Perturbation-based Regularization
SDARTS 引入基于扰动的正则化(随机平滑和对抗性)以稳定 DARTS,降低海森范数,并在不同搜索空间和数据集上提升 NAS 性能。
Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures. Based on the continuous relaxation of the architecture space, DARTS learns a differentiable architecture weight and largely reduces the search cost. However, its stability has been challenged for yielding deteriorating architectures as the search proceeds. We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability. Based on this observation, we propose a perturbation-based regularization - SmoothDARTS (SDARTS), to smooth the loss landscape and improve the generalizability of DARTS-based methods. In particular, our new formulations stabilize DARTS-based methods by either random smoothing or adversarial attack. The search trajectory on NAS-Bench-1Shot1 demonstrates the effectiveness of our approach and due to the improved stability, we achieve performance gain across various search spaces on 4 datasets. Furthermore, we mathematically show that SDARTS implicitly regularizes the Hessian norm of the validation loss, which accounts for a smoother loss landscape and improved performance.
研究动机与目标
- 动机:由于验证损失景观陡峭和离散投影导致 DARTS 不稳定。
- 提出 SDARTS 及其随机平滑(SDARTS-RS)和对抗性(SDARTS-ADV)形式来平滑损失景观。
- 证明 SDARTS 隐式正则化验证损失的海森范数,以提升稳定性和泛化能力。
- 在 CIFAR-10、ImageNet 和 Penn Treebank 的多个搜索空间上展示 SDARTS 的性能提升。
提出的方法
- 用基于邻域的目标替换当前的架构权重最小化:在架构权重的扰动上最小化训练损失。
- SDARTS-RS: w̄(A) = argmin_w E_{δ ~ U([-ε, ε])} L_train(w, A+δ).
- SDARTS-ADV: w̄(A) = argmin_w max_{||δ|| ≤ ε} L_train(w, A+δ).
- Update A by descending ∇_A L_val(w̄(A), A).
- Compute perturbations δ ̈either randomly or via an adversarial PGD procedure (min-max optimization).
- Both variants aim to yield a smoother L_val with respect to A, improving stability and generalization.
实验结果
研究问题
- RQ1基于扰动的正则化能否在面对陡峭的损失景观和投影不稳定性时稳定可微分架构搜索?
- RQ2随机平滑和对抗性扰动是否能使损失景观更平滑、在 NAS 中获得更好的泛化?
- RQ3SDARTS 是否对验证损失的海森范数进行隐式正则化,从而解释性能提升?
- RQ4与 DARTS 及其他基线相比,SDARTS 的变体在 CIFAR-10、ImageNet 和 PTB 空间中是否提升了鲁棒性和结果?
主要发现
- SDARTS-RS 和 SDARTS-ADV 相较于 Vanilla DARTS,生成更平滑的验证损失景观,降低对架构权重扰动的敏感性。
- 两种 SDARTS 变体在训练过程中降低验证损失的海森范数(谱范数),与稳定性提升相关。
- SDARTS-RS 和 SDARTS-ADV 在 CIFAR-10、CIFAR-100、SVHN 和 PTB 基准测试中优于 DARTS 及若干正则化基线。
- 将 SDARTS 适配到 PC-DARTS 和 P-DARTS 可获得持续的性能提升,且在 ImageNet 转移上的结果具有竞争力。
- SDARTS-ADV 常常达到最佳的 anytime 性能,在搜索训练 epoch 超出典型 DARTS 训练时仍在继续提升。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。