QUICK REVIEW

[论文解读] The Statistical Complexity of Early-Stopped Mirror Descent

Tomas Vaškevičius, Varun Kanade|arXiv (Cornell University)|Feb 1, 2020

Stochastic Gradient Optimization Techniques被引用 3

一句话总结

本文建立了偏置 Rademacher 复杂度与带平方损失的线性模型和核模型中早期停止镜像下降法的统计性能之间的直接联系。通过补全平方损失的凸性不等式，证明了镜像下降迭代的过失风险被由镜像映射、初始化、步长和迭代次数定义的函数类的偏置复杂度所界定——提供了一个简洁、统一的框架，以恢复并改进现有隐式正则化结果，且无需强凸性假设。

ABSTRACT

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.

研究动机与目标

理解在缺乏显式正则化的情况下，早期停止镜像下降法的统计保证。
识别潜在基于收敛分析与镜像下降法中偏置 Rademacher 复杂度之间的根本联系。
通过复杂度度量推导过失风险界，统一并改进现有隐式正则化结果。
证明早期停止镜像下降法可实现与显式正则化模型相当的性能，且无需显式约束或强凸性。

提出的方法

补全平方损失的凸性不等式，以将其与偏置 Rademacher 复杂度关联。
使用 Bregman 散度定义以未知参数为中心的函数类，形成 Bregman 球。
应用基于潜在的收敛分析，推导镜像下降法的数据依赖停止时间。
通过由 Dψ(α, α₀) ≤ R 定义的函数类 F(α₀, R) 的偏置复杂度推导过失风险界。
建立早期停止镜像下降法的过失风险与同一函数类上 ERM 的过失风险相当。
证明该结果在有限维线性模型和核范式下均成立。

实验结果

研究问题

RQ1如何利用偏置 Rademacher 复杂度来界定早期停止镜像下降法的过失风险？
RQ2在镜像下降法中，潜在基于收敛分析与偏置复杂度之间存在何种内在联系？
RQ3早期停止镜像下降法能否实现与显式正则化模型相当的统计性能？
RQ4在损失函数不具强凸性的情况下，该框架是否仍适用于核范式？
RQ5能否通过复杂度度量将镜像下降法的理论分析与经验风险最小化统一起来？

主要发现

早期停止镜像下降法的过失风险由函数类 F(α₀, R) 的偏置 Rademacher 复杂度所界定，其中 R 为从初始化出发的 Bregman 散度半径。
对任意 ε > 0，存在一个停止时间 t⋆ ≤ 2R/ε，使得期望过失风险被界定为 c₁E[Rn(F(α₀, R) − gF(α₀,R), c₂)] + ε，其中常数 c₁, c₂ 仅依赖于有界性参数。
该结果在有限维线性模型和核范式下均成立，且可推广至小步长下的平滑损失。
该框架恢复并改进了先前隐式正则化研究的结果，包括岭回归和 Lasso 类路径的相关结果。
尽管受到隐式约束，镜像下降迭代的过失风险保证几乎与同一 Bregman 球上 ERM 解的过失风险保证相同。
该分析无需损失函数在参数 α 上具有强凸性，从而将适用范围扩展至一般凸与非强凸设置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。