QUICK REVIEW

[논문 리뷰] Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu, Zilin Wang|arXiv (Cornell University)|2026. 02. 23.

Stochastic Gradient Optimization Techniques인용 수 0

한 줄 요약

본 논문은 매끄러운 활성화를 갖는 상수 깊이 신경망이 Sobolev 공간에서 미 minimax-optimal 근사 및 추정 속도를 달성할 수 있음을 보이고, 반면 상수 깊이의 ReLU 네트워크는 깊이에 의한 적응성의 한계를 가진다.

ABSTRACT

Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural networks with smooth activations for learning functions in the Sobolev space $W^{s,\infty}([0,1]^d)$ with $s>0$. We prove that constant-depth networks equipped with smooth activations achieve smoothness adaptivity: increasing width alone suffices to attain the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In contrast, for non-smooth activations such as ReLU, smoothness adaptivity is fundamentally limited by depth: the attainable approximation order is bounded by depth, and higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, complementary to depth, for achieving optimal rates over Sobolev function classes. Technically, our analysis is based on a multi-scale approximation framework that yields explicit neural network approximators with controlled parameter norms and model size. This complexity control ensures statistical learnability under empirical risk minimization (ERM) and avoids the impractical $\ell^0$-sparsity constraints commonly required in prior analyses.

연구 동기 및 목표

활성화의 매끄러움이 Sobolev 타겟에 대한 신경망의 근사 능력에 어떤 영향을 미치는지 조사한다.
상수 깊이의 네트워크가 매끄러운 활성화를 가질 때 깊이 증가 없이 minimax 최적 속도를 달성함을 보인다.
명시적 복잡도 및 노름 제어를 갖춘 구성적 네트워크 근사 스킴을 제공한다.
매끄러운 활성화와 비매끄러운 활성화(ReLU)를 대비시켜 적응성의 깊이 병목을 드러낸다.

제안 방법

조각상수 함수에 대한 다중스케일 근사 프레임워크를 개발하여 신경망 근사기를 구성한다.
상수 깊이에서 너비와 매개변수 노름을 제어하면서 L2 및 L∞ 근사 결과를 증명한다.
가중 합성 원리(weighted superposition principle)를 확립하여 국소화된 근사를 전역 L∞ 경계로 확장한다.
희소성 제약 없이 매끄러운 활성화를 가진 경우에 minimax-최적 속도를 보이는 ERM 일반화 보장을 도출한다.
상수 깊이 ReLU 네트워크에 대한 깊이 병목의 하한을 제공하여 본질적 한계를 보인다.

Figure 1 : Generalization error versus sample size for two-layer networks trained with different activation functions. Markers denote the measured generalization errors at each sample size (averaged over 5 runs), and solid lines show least-squares fits of the form $E(n)\propto n^{-\alpha}$ . The fit

실험 결과

연구 질문

RQ1[0,1]^d에서 매끄러운 활성화를 갖는 상수 깊이 신경망이 임의로 높은 함수 매끄러움에 적응할 수 있는가?
RQ2희소성 제약 없이 ERM 하에서 이러한 네트워크가 minimax-최적 추정 속도를 달성하는가?
RQ3ReLU와 같은 비매끄러운 활성화는 깊이 요구사항과 매끄러움에 대한 적응성 측면에서 어떻게 비교되는가?
RQ4최적의 근사 및 학습을 보장하기에 충분한 복잡도 제어(너비와 노름)는 무엇인가?

주요 결과

L=6이고 다항적으로 경계된 노름을 가지는 f ∈ W^{s,∞}([0,1]^d)에 대한 최적의 O(N^{-s/d}) 근사 속도를 상수 깊이의 매끄러운 활성화 네트워크가 달성한다.
이 네트워크들에 대한 ERM은 로그 인자에 의해 보정된 minimax-최적 O(n^{-2s/(2s+d)}) 추정 속도를 달성한다.
상수 깊이 ReLU 네트워크에 대한 깊이 병목이 증명되며, 근사 속도는 N^{- ext{min}\{L-1,s"}}에서 포화된다; 더 높은 매끄러움은 더 깊은 네트워크를 필요로 한다.
매끄러운 활성화를 고정 깊이에서 학습 대상의 매끄러움에 대해 더 빠른 일반화가 나타난다는 실험 결과를 지지한다.
활성화의 매끄러움이 Sobolev 공간에서 매끄러움 적응성의 메커니즘으로 깊이를 대체할 수 있음을 보여준다.

Figure 2 : Illustration of the approximator construction for $f^{\star}$ in Theorem B.19 with $d=1$ and $K=2$ . (a) Approximate $f^{\star}$ by piecewise polynomials, realized as the product of global polynomials and piecewise constant functions. (b) The $4$ -piece piecewise constant function on refi

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.