Skip to main content
QUICK REVIEW

[论文解读] Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

Guillaume Braun, Han Bao|arXiv (Cornell University)|Jan 30, 2026
Advanced Electron Microscopy Techniques and Applications被引用 0
一句话总结

Spectral Gradient Descent (SpecGD) 通过三维不变流形分析和基于符号的更新,缓解在相位检索模型中由各向异性输入引起的方差驱动错位,从而比标准梯度下降(GD)更快且更稳定地对齐。理论结果和实证测试显示 SpecGD 在更广泛的各向异性协方差下的鲁棒性,包括幂律谱。

ABSTRACT

Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.

研究动机与目标

  • Understand how anisotropic covariance affects learning dynamics in phase retrieval with a two-layer quadratic network.
  • Characterize the learning dynamics under spectral gradient updates and compare with standard gradient descent.
  • Identify mechanisms by which SpecGD avoids spike amplification and improves alignment.
  • Extend insights to broader anisotropic covariances beyond the stylized spiked model.
  • Provide theoretical and empirical validation of the proposed dynamics and benefits.

提出的方法

  • Model data as phase retrieval with Gaussian inputs having spiked covariance orthogonal to the signal.
  • Analyze SpecGD where the gradient update is replaced by its polar factor to preserve directional information while discarding scale.
  • Show a three-dimensional invariant manifold M spanned by the signal, spike, and bulk components and reduce dynamics to coefficients a, b, c.
  • Derive continuous-time reduced ODEs for Gradient Flow (GF) and SpecGF, highlighting sign-based, scale-invariant updates for SpecGD.
  • Compare discrete-time SpecGD and GD, including learning-rate regimes and stage-wise behavior, with proofs and barriers.
  • Conduct numerical experiments including power-law covariance to validate robustness.

实验结果

研究问题

  • RQ1How does anisotropy in input covariances affect learning dynamics in phase retrieval under GD and SpecGD?
  • RQ2Can SpecGD prevent spike-driven misalignment and promote earlier, stable alignment with the target signal?
  • RQ3What are the Stage I (growth) and Stage II (alignment) dynamics under SpecGD versus GD?
  • RQ4Do the theoretical findings extend beyond the spiked covariance model to general anisotropic covariances such as power-law spectra?
  • RQ5How do continuous-time dynamics translate to discrete-time updates and learning-rate choices?

主要发现

  • SpecGD induces a sign-based, scale-invariant update on an adaptive basis, preventing amplification of uninformative spike directions.
  • Training dynamics collapse to a three-parameter invariant manifold with coordinated growth of signal, spike, and bulk components.
  • Stage I under SpecGD shows uniform, quadratic growth of all coefficients at constant time, independent of dimension d, unlike GD.
  • Stage II under SpecGD yields sustained growth of the signal coefficient and bounded growth of spike and bulk, leading to alignment (Align ≈ 1).
  • GD experiences spike-dominated growth in Stage I and delayed signal growth in Stage II, resulting in longer overall training to achieve alignment.
  • Numerical experiments confirm theory and show SpecGD robustness under power-law covariances and finite-sample settings.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。