[论文解读] Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
本文分析经典草拟和Hessian草拟如何影响矩阵岭回归的优化与统计表现,揭示偏差-方差权衡,并展示模型平均作为补救。
We address the statistical and optimization impacts of the classical sketch and Hessian sketch used to approximately solve the Matrix Ridge Regression (MRR) problem. Prior research has quantified the effects of classical sketch on the strictly simpler least squares regression (LSR) problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does on those of LSR: namely, it recovers nearly optimal solutions. By contrast, Hessian sketch does not have this guarantee, instead, the approximation error is governed by a subtle interplay between the "mass" in the responses and the optimal objective value. For both types of approximation, the regularization in the sketched MRR problem results in significantly different statistical properties from those of the sketched LSR problem. In particular, there is a bias-variance trade-off in sketched MRR that is not present in sketched LSR. We provide upper and lower bounds on the bias and variance of sketched MRR, these bounds show that classical sketch significantly increases the variance, while Hessian sketch significantly increases the bias. Empirically, sketched MRR solutions can have risks that are higher by an order-of-magnitude than those of the optimal MRR solutions. We establish theoretically and empirically that model averaging greatly decreases the gap between the risks of the true and sketched solutions to the MRR problem. Thus, in parallel or distributed settings, sketching combined with model averaging is a powerful technique that quickly obtains near-optimal solutions to the MRR problem while greatly mitigating the increased statistical risk incurred by sketching.
研究动机与目标
- 研究草拟如何影响矩阵岭回归(MRR)的优化质量,相对于最优解。
- 在不同草拟方案下,表征被草拟的MRR解的统计偏差与方差。
- 研究在优化和统计设置中,模型平均在减小草拟带来的风险增加中的作用。
- 相较于MRR,当 n >> d 时,比较经典草拟和Hessian草拟在保证与实际性能方面的差异。
提出的方法
- 设定MRR问题及其两种草拟变体:经典草拟 W^c 与 Hessian 草拟 W^h。
- 推导在经典与Hessian草拟下,关于若干草拟方案(Gaussian、SRHT、基于杠杆、均匀、CountSketch)的 f(W) − f(W*) 的理论界限。
- 在固定设计模型 Y = XW0 + Xi、含噪声假设下,推导 W*、W^c、W^h 的偏差-方差分解。
- 引入模型平均,通过对 g 个草拟的 MRR 问题求解取平均,以降低优化和统计误差。
- 给出在何种条件下模型平均可实现近似最优风险,并讨论分布式/一次性设置。
实验结果
研究问题
- RQ1经典草拟如何相对于最优MRR解影响优化目标值?
- RQ2Hessian草拟如何相对于最优MRR解影响优化目标值?
- RQ3在不同草拟方法下,被草拟的MRR解的偏差与方差含义是什么?
- RQ4模型平均能否缩小被草拟与真实MRR解之间的风险差距?在何种条件下?
- RQ5就被草拟的MRR,在优化与统计视角下的结果有何不同?
主要发现
- 经典草拟在 s = Õ(d/ε) 时,达到近似最优目标值,f(W^c) ≤ (1+ε) f(W*).
- Hessian草拟并不保证接近最优的目标值;当 ||Y||_F^2/n 主导 f(W*) 时,f(W^h) 可能与 f(W*) 相差甚远。
- 被草拟的MRR 展现出在草拟的偏差-方差权衡,这在草拟的 LSR 中不存在;经典草拟使方差增加 Θ(n/s),而 Hessian 草拟增加偏差。
- 对 g 个草拟的 MRR 解进行模型平均,对经典草拟既减小目标值差距又减小方差,对Hessian草拟则减小偏差;在足够的 s 下,平均可以接近近似最优风险。
- 经验证,草拟的MRR 风险可能比最优MRR 高出一个数量级,而模型平均在集中和分布式设置中显著缩小这一差距。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。