QUICK REVIEW

[论文解读] Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

Shusen Wang, Zhihua Zhang|arXiv (Cornell University)|Mar 18, 2013

Sparse and Compressive Sensing Techniques参考文献 49被引用 149

一句话总结

该论文提出了一种用于CUR矩阵分解和Nyström近似的自适应采样算法，可在无需对数据矩阵施加限制性假设的情况下实现改进的相对误差界。通过利用自适应列/行采样的通用误差界，该方法在保持低时间复杂度和低内存使用的同时，实现了更低的近似误差，在理论和实践中均优于标准方法和集成Nyström方法。

ABSTRACT

The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relative-error bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.

研究动机与目标

解决标准CUR和Nyström方法的局限性，这些方法通常存在较高的近似误差且缺乏理论保证。
为自适应列/行采样开发一种更通用的误差界，适用于任意数据矩阵，无需特殊假设。
设计新的CUR和Nyström算法，其具有期望的相对误差界，相比现有随机化方法可提高精度。
通过避免存储完整矩阵，确保低时间复杂度和最小内存使用，使该方法适用于大规模数据。
为标准和集成Nyström方法提供理论下界，以确立这些技术的性能极限。

提出的方法

基于杠杆得分和谱特性，引入一种用于矩阵近似中自适应列和行采样的通用误差界。
利用该误差界设计一种新的自适应采样策略，根据列和行对低秩结构的贡献程度，以更高概率选择。
通过自适应选择 $ c $ 列和 $ r $ 行来构建CUR分解，然后将中间矩阵 $ extbf{W} $ 计算为所选列和行交集的伪逆。
将相同的自适应采样框架应用于Nyström方法，该方法使用列的子集来近似对称半正定矩阵。
引入一种集成Nyström方法，通过平均 $ t $ 个独立样本以提高稳定性和降低方差。
推导出Frobenius范数和核范数下的近似误差理论界，表明误差与 $ (1- heta) $ 成比例，其中 $ heta $ 控制采样偏差。

实验结果

研究问题

RQ1自适应采样是否能在不假设特定数据结构的前提下，改进CUR和Nyström近似的相对误差界？
RQ2标准和集成Nyström方法的近似误差理论下界是什么？
RQ3与均匀采样或基于杠杆得分的采样相比，自适应采样在误差和计算效率方面表现如何？
RQ4所提方法是否能在保持低时间复杂度和最小内存占用的同时实现相对误差界？
RQ5集成平均对Nyström近似的稳定性和准确性有何影响？

主要发现

所提出的自适应采样算法在无需对输入矩阵施加特殊假设的前提下，对CUR和Nyström近似均实现了期望的相对误差界。
集成Nyström方法的Frobenius范数误差界下界为 $ (1- heta)^2 \bigg{[}\big{(}m-2c+\frac{c}{t}-k\big{)}+k\bigg{(}\frac{m-c+\frac{c}{t}+k\frac{1-\theta}{\theta}}{c+k\frac{1-\theta}{\theta}}\bigg{)}^{2}\bigg{]} $，表明自适应采样可实现更优的收敛性。
集成Nyström方法的核范数误差界至少为 $ (1-\theta)(m-c)\frac{c+\frac{1}{\theta}k}{c+\frac{1-\theta}{\theta}k} $，展现出强大的理论保证。
本文建立了集成Nyström方法相对误差比的下界，表明在最坏情况下其值可高达 $ \frac{m-c}{m-k}\big{(}1+\frac{k}{c}\big{)} $。
理论分析证实，标准Nyström方法通常无法实现相对误差界，凸显了自适应采样的优势。
该方法避免在RAM中存储完整矩阵，同时保持低时间复杂度，因此适用于大规模和稀疏矩阵。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。