QUICK REVIEW

[论文解读] Low Rank Approximation and Regression in Input Sparsity Time

Kenneth L. Clarkson, David P. Woodruff|arXiv (Cornell University)|Jul 26, 2012

Sparse and Compressive Sensing Techniques参考文献 51被引用 22

一句话总结

本文提出稀疏嵌入矩阵，使低秩逼近、回归和杠杆系数估计的算法达到输入稀疏性时间复杂度。通过在 O(nnz(A)) 时间内构建子空间嵌入，该方法在过约束回归、低秩逼近和 ℓp-回归方面实现了最优或近似最优的运行时间，显著优于以往需要 Ω(nd log d) 时间才能获得类似保证的方法。

ABSTRACT

We design a new distribution over $\poly(r \eps^{-1}) imes n$ matrices $S$ so that for any fixed $n imes d$ matrix $A$ of rank $r$, with probability at least 9/10, $ orm{SAx}_2 = (1 \pm \eps) orm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. Such a matrix $S$ is called a \emph{subspace embedding}. Furthermore, $SA$ can be computed in $ nz(A) + \poly(d \eps^{-1})$ time, where $ nz(A)$ is the number of non-zero entries of $A$. This improves over all previous subspace embeddings, which required at least $Ω(nd \log d)$ time to achieve this property. We call our matrices $S$ \emph{sparse embedding matrices}. Using our sparse embedding matrices, we obtain the fastest known algorithms for $(1+\eps)$-approximation for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and $\ell_p$-regression. The leading order term in the time complexity of our algorithms is $O( nz(A))$ or $O( nz(A)\log n)$. We optimize the low-order $\poly(d/\eps)$ terms in our running times (or for rank-$k$ approximation, the $n*\poly(k/eps)$ term), and show various tradeoffs. For instance, we also use our methods to design new preconditioners that improve the dependence on $\eps$ in least squares regression to $\log 1/\eps$. Finally, we provide preliminary experimental results which suggest that our algorithms are competitive in practice.

研究动机与目标

设计一种稀疏嵌入矩阵 S，使其在高概率下保持所有 x ∈ ℝ^d 的 Ax 的 ℓ2 范数。
在 O(nnz(A)) + Õ(d³ε⁻²) 时间内实现过约束最小二乘回归，优于以往的 Ω(nd log d) 时间界限。
在 O(nnz(A)) + Õ(nk²ε⁻⁴ + k³ε⁻⁵) 时间内实现 n×n 矩阵的低秩逼近，逼近最优秩-k 解的 (1+ε) 误差。
在 O(nnz(A log n)) + Õ(r³) 时间内，以常数相对误差计算 n×d 矩阵的所有杠杆系数。
在 O(nnz(A) log n) + poly(rε⁻¹) 时间内求解任意常数 p ∈ [1, ∞) 的 ℓp-回归，达到 (1+ε) 相对误差。

提出的方法

设计一个大小为 poly(rε⁻¹) × n 的稀疏矩阵 S 的概率分布，使其以高概率作为子空间嵌入。
使用 S 在 O(nnz(A)) 时间内计算 SA，使得对所有 x ∈ ℝ^d 都有 ||SAx||₂ ≈ (1±ε)||Ax||₂。
结合杠杆系数采样和随机哈达玛变换，利用稀疏嵌入矩阵加速低秩逼近。
采用两阶段采样过程：首先通过随机投影 Π₂ 估计行范数，然后基于这些估计值对行进行采样，以实现 ℓp-回归。
借鉴先前工作的良好条件基框架，并将其适配到稀疏嵌入中，以降低样本复杂度和运行时间。
优化运行时间中的多项式因子，并探索算法中准确率与效率之间的权衡。

实验结果

研究问题

RQ1我们能否在输入稀疏性时间 O(nnz(A)) 内构造一个子空间嵌入矩阵 S，使其以高概率保持所有 Ax 的 ℓ2 范数？
RQ2我们能否在 O(nnz(A)) + Õ(d³ε⁻²) 时间内实现过约束最小二乘回归的 (1+ε)-近似解？
RQ3我们能否在 O(nnz(A)) + Õ(nk²ε⁻⁴ + k³ε⁻⁵) 时间内实现 n×n 矩阵的 (1+ε)-近似低秩分解？
RQ4我们能否在 O(nnz(A) log n) + Õ(r³) 时间内，以常数相对误差计算 n×d 矩阵的所有杠杆系数？
RQ5我们能否在 O(nnz(A) log n) + poly(rε⁻¹) 时间内求解任意常数 p ∈ [1, ∞) 的 ℓp-回归，达到 (1+ε) 相对误差？

主要发现

所提出的稀疏嵌入矩阵可在 O(nnz(A)) 时间内计算 SA，且以至少 9/10 的概率保持 ||SAx||₂ ≈ (1±ε)||Ax||₂ 对所有 x ∈ ℝ^d 成立。
过约束 ℓ2-回归的算法运行时间为 O(nnz(A)) + Õ(d³ε⁻²)，优于以往的 Ω(nd log d) 时间界限。
对于低秩逼近，算法运行时间为 O(nnz(A)) + Õ(nk²ε⁻⁴ + k³ε⁻⁵)，逼近最优秩-k 解的 (1+ε) 误差。
n×d 矩阵的所有杠杆系数均可在 O(nnz(A) log n) + Õ(r³) 时间内以常数相对误差近似。
ℓp-回归算法在任意常数 p ∈ [1, ∞) 下运行时间为 O(nnz(A) log n) + poly(rε⁻¹)，达到 (1+ε) 相对误差。
初步实验表明，这些算法在实践中表现良好，即使采样减少，低秩逼近误差也接近最优秩-k 逼近。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。