QUICK REVIEW

[논문 리뷰] Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Xiangning Chen, Cho‐Jui Hsieh|arXiv (Cornell University)|2020. 02. 12.

Adversarial Robustness in Machine Learning참고 문헌 51인용 수 82

한 줄 요약

SDARTS는 perturbation-based regularization (random smoothing and adversarial)을 도입하여 DARTS를 안정화하고, Hessian 노름을 감소시키며, 공간과 데이터셋 전반에 걸쳐 NAS 성능을 향상시킨다.

ABSTRACT

Differentiable architecture search (DARTS) is a prevailing NAS solution to identify architectures. Based on the continuous relaxation of the architecture space, DARTS learns a differentiable architecture weight and largely reduces the search cost. However, its stability has been challenged for yielding deteriorating architectures as the search proceeds. We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability. Based on this observation, we propose a perturbation-based regularization - SmoothDARTS (SDARTS), to smooth the loss landscape and improve the generalizability of DARTS-based methods. In particular, our new formulations stabilize DARTS-based methods by either random smoothing or adversarial attack. The search trajectory on NAS-Bench-1Shot1 demonstrates the effectiveness of our approach and due to the improved stability, we achieve performance gain across various search spaces on 4 datasets. Furthermore, we mathematically show that SDARTS implicitly regularizes the Hessian norm of the validation loss, which accounts for a smoother loss landscape and improved performance.

연구 동기 및 목표

샤프한 검증 손실 지형과 이산적 프로젝션으로 인한 DARTS의 불안정성에 대한 동기 부여.
SDARTS를 random smoothing (SDARTS-RS) 및 adversarial (SDARTS-ADV) 형식으로 제안하여 손실 지형을 매끄럽게 한다.
SDARTS가 검증 손실의 Hessian을 암시적으로 규제하여 안정성과 일반화를 향상시킴을 보여준다.
다수의 search spaces에서 CIFAR-10, ImageNet, Penn Treebank에서 SDARTS의 성능 향상을 보여준다.

제안 방법

현재의 아키텍처 가중치 최소화를 주변 기반 목표로 대체: 아키텍처 가중치의 perturbation들에 대한 훈련 손실을 최소화한다.
SDARTS-RS: w̄(A) = argmin_w E_{δ ~ U([-ε, ε])} L_train(w, A+δ).
SDARTS-ADV: w̄(A) = argmin_w max_{||δ|| ≤ ε} L_train(w, A+δ).
A를 ∇_A L_val(w̄(A), A) 하강으로 업데이트한다.
Perturbation δ를 무작위로 또는 적대적 PGD 절차(min-max optimization)를 통해 계산한다.
두 가지 변형 모두 A에 대해 더 부드러운 L_val을 유도하여 안정성 및 일반화를 향상시키는 것을 목표로 한다.

실험 결과

연구 질문

RQ1Perturbation-based regularization이 날카로운 손실 지형과 프로젝션 불안정성에 대해 differentiable architecture search를 안정화시킬 수 있는가?
RQ2Random smoothing과 adversarial perturbations가 NAS에서 더 매끄러운 손실 지형과 더 나은 일반화를 가져오는가?
RQ3SDARTS가 검증 손실의 Hessian 노름을 암시적으로 규제하여 성능 향상을 설명하는가?
RQ4SDARTS 변형들이 CIFAR-10, ImageNet, PTB 공간에서 DARTS 및 기타 기반선 대비 로버스트성과 결과를 개선하는가?

주요 결과

SDARTS-RS와 SDARTS-ADV는 vanilla DARTS에 비해 검증 손실 지형을 더 매끄럽게 만들어 아키텍처 가중치에 대한 잡음에 대한 민감도를 감소시킨다.
두 SDARTS 변형 모두 학습 중 검증 손실의 Hessian 노름(스펙트럴 노름)을 감소시키며, 이는 안정성 향상과 상관관계가 있다.
SDARTS-RS와 SDARTS-ADV는 CIFAR-10, CIFAR-100, SVHN, PTB 벤치마드에서 DARTS 및 여러 규제 기반선보다 우수한 성능을 보인다.
PC-DARTS와 P-DARTS에 대한 SDARTS의 어댑터화가 일관된 성능 향을 가져다주고, ImageNet 전이에서도 경쟁력 있는 결과를 보인다.
SDARTS-ADV는 종종 최적 시점에서 가장 우수한 즉시 성능을 달성하며, 검색 에포크가 일반적인 DARTS 학습을 넘어 확장될 때 계속 개선된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.