QUICK REVIEW

[论文解读] Ten Steps of EM Suffice for Mixtures of Two Gaussians

Constantinos Daskalakis, Christos Tzamos|arXiv (Cornell University)|Sep 1, 2016

Bayesian Methods and Mixture Models参考文献 26被引用 29

一句话总结

该论文首次为具有已知相等协方差矩阵的两高斯混合模型中的期望最大化（EM）算法提供了全局收敛性保证。证明了在单变量情况下从无穷远初始化时，十次迭代的总体EM算法可实现几何收敛，使真实均值的误差低于1%；并建立了在马氏距离下以ε-准确度估计的有限样本样本复杂度为$ ilde{O}(d/ heta^2)$。

ABSTRACT

The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $ ilde{O}(d/ε^2)$ samples suffice to compute the unknown vectors to within $ε$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $ ilde{O}\left(\sqrt{d \over n} ight)$ where $n$ is the number of samples, which is optimal up to logarithmic factors.

研究动机与目标

解决非凸似然优化中EM算法缺乏理论收敛保证的问题。
分析平衡的两高斯混合模型（已知协方差）中EM算法的收敛性。
建立总体版本（无限样本）和有限样本情形下的全局收敛性。
推导在马氏距离下以$ heta$误差估计未知均值的紧致样本复杂度边界。
证明在有限样本设置下，EM算法的误差率可达到最优，仅相差对数因子。

提出的方法

在重参数化$p_{\bm{\mu}}(\bm{x}) = 0.5\mathcal{N}(\bm{x}; \bm{\mu}, \Sigma) + 0.5\mathcal{N}(\bm{x}; -\bm{\mu}, \Sigma)$下分析总体EM算法，其中$\bm{\mu}$为未知均值向量。
推导出EM更新的闭式表达式：$\bm{\lambda}^{(t+1)} = \mathbb{E}_{\bm{x} \sim p_{\bm{\mu}}}\left[ \frac{0.5\mathcal{N}(\bm{x}; \bm{\lambda}^{(t)}, \Sigma)}{p_{\bm{\lambda}^{(t)}}(\bm{x})} \bm{x} \right] \Big/ \mathbb{E}_{\bm{x} \sim p_{\bm{\mu}}}\left[ \frac{0.5\mathcal{N}(\bm{x}; \bm{\lambda}^{(t)}, \Sigma)}{p_{\bm{\lambda}^{(t)}}(\bm{x})} \right]$，从而支持几何收敛性分析。
利用浓度不等式和矩界控制有限样本情形下的抽样误差，特别是限制经验期望与其真实值之间的偏差。
应用次高斯尾部界和超矩形性分析高斯混合下$\tanh(\lambda x)$的行为，从而在高概率下控制估计误差。
建立一个压缩不等式，表明$\|\tilde{\bm{\lambda}}^{(t+1)} - \bm{\mu}\|_{\Sigma} \leq \max(e^{-\mu^2/10}, 9/10) \|\tilde{\bm{\lambda}}^{(t)} - \bm{\mu}\|_{\Sigma} + 2\varepsilon\mu^2$，证明几何收敛性。
将压缩结果与样本复杂度分析相结合，证明$ ilde{O}(d/\epsilon^2)$个样本足以在马氏距离下实现$ heta$-准确度估计。

实验结果

研究问题

RQ1在已知协方差的两高斯混合模型中，EM是否能全局收敛到真实参数？
RQ2为实现均值估计的$ heta$-准确度，需要多少次EM迭代？
RQ3在此设置下，基于EM的估计所需的最优样本复杂度是多少？
RQ4EM的有限样本误差能否以高概率有界，其收敛速率如何？
RQ5在样本量下，EM的误差率是否最优，仅相差对数因子？

主要发现

在单变量两高斯混合模型中，从无穷远初始化的总体EM算法，经过十次迭代后，均值估计误差低于1%。
总体EM算法以闭式收敛速率实现对真实均值向量的几何收敛，收敛速率依赖于与真实均值的马氏距离。
在有限样本情形下，$ ilde{O}(d/\epsilon^2)$个样本足以在马氏距离下以$ heta$误差估计未知均值向量。
基于EM的估计器的误差率为$ ilde{O}(\sqrt{d/n})$，在样本量$n$下为最优，仅相差对数因子。
算法以速率$\max(e^{-\mu^2/10}, 9/10)$实现几何收敛，确保在分量充分分离时快速收敛。
通过高斯混合下$\tanh(\lambda x)$的浓度性质，建立了抽样误差的高概率界，从而支持有限样本分析。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。