[论文解读] Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts
该论文提出一个增量随机大偏-最小化(MM)框架,推广了增量EM,证明其收敛到驻点的一致性,并在softmax门控的专家混合模型(MoE)上展示相较常用优化器的优越性能。
Processing high-volume, streaming data is increasingly common in modern statistics and machine learning, where batch-mode algorithms are often impractical because they require repeated passes over the full dataset. This has motivated incremental stochastic estimation methods, including the incremental stochastic Expectation-Maximization (EM) algorithm formulated via stochastic approximation. In this work, we revisit and analyze an incremental stochastic variant of the Majorization-Minimization (MM) algorithm, which generalizes incremental stochastic EM as a special case. Our approach relaxes key EM requirements, such as explicit latent-variable representations, enabling broader applicability and greater algorithmic flexibility. We establish theoretical guarantees for the incremental stochastic MM algorithm, proving consistency in the sense that the iterates converge to a stationary point characterized by a vanishing gradient of the objective. We demonstrate these advantages on a softmax-gated mixture of experts (MoE) regression problem, for which no stochastic EM algorithm is available. Empirically, our method consistently outperforms widely used stochastic optimizers, including stochastic gradient descent, root mean square propagation, adaptive moment estimation, and second-order clipped stochastic optimization. These results support the development of new incremental stochastic algorithms, given the central role of softmax-gated MoE architectures in contemporary deep neural networks for heterogeneous data modeling. Beyond synthetic experiments, we also validate practical effectiveness on two real-world datasets, including a bioinformatics study of dent maize genotypes under drought stress that integrates high-dimensional proteomics with ecophysiological traits, where incremental stochastic MM yields stable gains in predictive performance.
研究动机与目标
- 为高吞吐量流数据和超复杂潜在模型(超越显式潜在变量表示),提出并发展一个增量随机MM框架的动机与方法。
- 提供理论保证,证明所提出算法在到达驻点方面的收敛性(一致性)。
- 将该方法应用于softmax门控的MoE模型(连续与离散输出),在随机EM失效的情形下仍可有效。
- 在合成数据和真实世界数据集上展示相对于常用优化器的经验优势,包括高维场景。
提出的方法
- 提出一个增量(在线)MM算法,通过随机近似步更新代理参数向量,然后通过在指数族形式下最小化代理来更新参数迭代。
- 使用满足指数族结构、凸性和唯一极小值性质的主界面,以确保可解的更新。
- 建立Lyapunov函数框架与随机近似分析,证明对期望目标的几乎必然收敛到驻点。
- 给出关键引理的更正界限,以为softmax门控MoE模型构造有效的主界面。
- 将增量MM方案专门化为SGMoE和Softmax门控多项逻辑MoE模型,解决可辨识性与正则性问题。

实验结果
研究问题
- RQ1是否可以设计一个增量随机MM算法来处理MoE模型,而无需依赖显式潜在变量表示?
- RQ2在何种条件下,增量随机MM算法收敛到期望目标的驻点(一致性)?
- RQ3与标准随机优化器相比,所提方法在具有连续与离散输出的softmax门控MoE模型上的表现如何?
- RQ4将增量随机MM应用到softmax门控MoE架构时,哪些实际与理论局限性需要考虑,如何缓解?
主要发现
- 所提出的增量随机MM算法实现了一致性,迭代会收敛到梯度为零的驻点。
- 实证结果显示该方法在softmax门控MoE回归问题上优于SGD、RMSProp、Adam和Sophia。
- 该方法在高维设置和真实世界数据集上仍然有效,包括类似生物信息学的蛋白质组学和生态物理性状数据。
- 该工作强调现有的增量随机MM/EM变体在softmax门控MoEs上表现不佳的原因,并通过正则性与代理构造提供了改进方案。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。