QUICK REVIEW

[论文解读] Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu, Zilin Wang|arXiv (Cornell University)|Feb 23, 2026

Stochastic Gradient Optimization Techniques被引用 0

一句话总结

本文证明具有平滑激活的常深度神经网络在Sobolev空间上可实现极小极大误差的近似与估计速率，而常深度的ReLU网络在深度自适应性方面受限。

ABSTRACT

Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural networks with smooth activations for learning functions in the Sobolev space $W^{s,\infty}([0,1]^d)$ with $s>0$. We prove that constant-depth networks equipped with smooth activations achieve smoothness adaptivity: increasing width alone suffices to attain the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In contrast, for non-smooth activations such as ReLU, smoothness adaptivity is fundamentally limited by depth: the attainable approximation order is bounded by depth, and higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, complementary to depth, for achieving optimal rates over Sobolev function classes. Technically, our analysis is based on a multi-scale approximation framework that yields explicit neural network approximators with controlled parameter norms and model size. This complexity control ensures statistical learnability under empirical risk minimization (ERM) and avoids the impractical $\ell^0$-sparsity constraints commonly required in prior analyses.

研究动机与目标

研究激活平滑性如何影响神经网络对Sobolev目标的近似能力。
证明具有平滑激活的常深度网络在不增加深度的情况下实现极小极大速率。
给出具有明确复杂度和范数控制的构造性网络近似方案。
对比平滑激活与非平滑激活（ReLU），揭示深度在自适应性中的瓶颈。

提出的方法

为分段常数函数建立多尺度近似框架以构造神经网络近似器。
在恒定深度、受控宽度与参数范数下证明L2和L∞近似结果。
建立加权叠加原理以将局部近似推广到全局L∞界。
推导经验风险最小化（ERM）的泛化保证，显示平滑激活在无稀疏约束下的极小极大速率。
给出恒定深度ReLU网络的深度瓶颈下界，显示内在局限性。

Figure 1 : Generalization error versus sample size for two-layer networks trained with different activation functions. Markers denote the measured generalization errors at each sample size (averaged over 5 runs), and solid lines show least-squares fits of the form $E(n)\propto n^{-\alpha}$ . The fit

实验结果

研究问题

RQ1具有平滑激活的常深度神经网络是否能对[0,1]^d上的任意高的函数光滑度进行自适应？
RQ2在ER M下无稀疏约束时，此类网络是否达到极小极大估计速率？
RQ3如ReLU等非平滑激活在深度需求和对光滑度的自适应性方面有何比较？
RQ4哪些复杂度控制（宽度与范数）足以保证最优近似与学习？

主要发现

具有平滑激活的常深度网络在W^{s,∞}([0,1]^d)上的最优近似速率O(N^{-s/d})可实现，条件为L=6且范数多项式有界。
在这些网络上的ERM达到极小极大最优的估计速率O(n^{-2s/(2s+d)})，对数因子除外。
已证明常深度ReLU网络存在深度瓶颈，其近似速率在N^{-min\{L-1,s"}}处饱和；更高的平滑度需要更深的网络。
经验结果支持在固定深度下，平滑激活在学习平滑目标时具有更快的泛化。
结果表明激活的平滑性可以作为深度的替代机制，在Sobolev空间中实现对平滑性的自适应。

Figure 2 : Illustration of the approximator construction for $f^{\star}$ in Theorem B.19 with $d=1$ and $K=2$ . (a) Approximate $f^{\star}$ by piecewise polynomials, realized as the product of global polynomials and piecewise constant functions. (b) The $4$ -piece piecewise constant function on refi

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。