QUICK REVIEW

[论文解读] Input Sparsity and Hardness for Robust Subspace Approximation

Kenneth L. Clarkson, David P. Woodruff|arXiv (Cornell University)|Oct 20, 2015

Sparse and Compressive Sensing Techniques参考文献 21被引用 20

一句话总结

本文提出了针对 $p \in [1,2)$ 的鲁棒子空间逼近的输入稀疏性算法，实现了在 $O(\operatorname{nnz}(A) + (n+d)\cdot{\mathrm{poly}}(k/\varepsilon) + \exp({\mathrm{poly}}(k/\varepsilon)))$ 时间内的 $(1+\varepsilon)$-近似。证明了对 $(1+1/{\mathrm{poly}}(d))$-近似的 NP-难解性，解决了长期开放的问题，并首次给出了针对 $M$-估计回归的 $O(\operatorname{nnz}(A) + {\mathrm{poly}}(d/\varepsilon))$-时间算法。

ABSTRACT

In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss function M(), for example, M-Estimators, which include the Huber and Tukey loss functions. Such subspaces provide alternatives to the singular value decomposition (SVD), which is the p=2 case, finding such an F that minimizes the sum of squares of distances. For p in [1,2), and for typical M-Estimators, the minimizing $F$ gives a solution that is more robust to outliers than that provided by the SVD. We give several algorithmic and hardness results for these robust subspace approximation problems. We think of the n points as forming an n x d matrix A, and letting nnz(A) denote the number of non-zero entries of A. Our results hold for p in [1,2). We use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) + (n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a (1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and complementing prior results which held for p >2, (3) For loss functions for a wide class of M-Estimators, we give a problem-size reduction: for a parameter K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps)) time to reduce the problem to a constrained version involving matrices whose dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4) Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for (1+eps)-approximate regression for a wide class of convex M-Estimators.

研究动机与目标

为鲁棒子空间逼近设计高效算法，其中 $p \in [1,2)$，目标是最小化 $\sum_i \mathrm{dist}(a_i, F)^p$ 或一般 $M$-估计器。
建立在 $1+1/{\mathrm{poly}}(d)$ 因子内近似鲁棒子空间逼近的计算困难性，即使对于 $p \in [1,2)$ 也是如此。
通过问题规模缩减技术，将大规模鲁棒子空间问题约化为更小的、受约束的实例。
设计首个针对广泛类别的凸 $M$-估计器的输入稀疏性 $(1+\varepsilon)$-近似回归算法。

提出的方法

利用输入稀疏性技术，实现运行时间与非零元素个数 $\operatorname{nnz}(A)$ 呈线性关系，结合通过共轭核构造实现的降维。
采用递归框架，结合杠杆度采样与加权范数估计，以在保持近似保证的同时减小问题规模。
在经过 $O(\log n)$ 层递归后，对缩减后的实例应用椭球法，当缩减后的规模为 $n^{\beta}{\mathrm{poly}}(d/\varepsilon)$ 且 $\beta < 1/(2C)$ 时，确保多项式时间可解。
通过一种新颖的约化方法，将问题转化为大小为 ${\mathrm{poly}}(K\varepsilon^{-1}\log n)$ 的受约束问题，其中 $K = (\log n)^{O(\log k)}$，以处理 $M$-估计器。
引入一种双准则解框架，允许在近似质量与维度之间进行权衡。
通过从团问题（Clique problem）的归约，证明了对 $(1+1/{\mathrm{poly}}(d))$-近似的 NP-难解性，即使对于 $p \in [1,2)$ 也是如此。

实验结果

研究问题

RQ1我们能否在输入稀疏性时间内实现对 $p \in [1,2)$ 的 $(1+\varepsilon)$-近似鲁棒子空间逼近？
RQ2对 $p \in [1,2)$ 的鲁棒子空间逼近是否在 $1+1/{\mathrm{poly}}(d)$ 因子内近似为 NP-难？
RQ3我们能否在保持近似质量的前提下，将大规模鲁棒子空间问题约化为更小的、结构化的实例？
RQ4使用广泛类别的凸 $M$-估计器时，$(1+\varepsilon)$-近似回归的最优运行时间是什么？
RQ5我们能否将输入稀疏性技术扩展到 $p=2$（SVD）之外，应用于鲁棒 $M$-估计器回归？

主要发现

提出了一种运行时间为 $O(\operatorname{nnz}(A) + (n+d)\cdot{\mathrm{poly}}(k/\varepsilon) + \exp({\mathrm{poly}}(k/\varepsilon)))$ 的算法，可为 $\sum_i \mathrm{dist}(a_i, F)^p$（$p \in [1,2)$）计算出 $(1+\varepsilon)$-近似 $k$-维子空间。
证明了对 $p \in [1,2)$ 最小化 $\sum_i \mathrm{dist}(a_i, F)^p$ 的问题在 $1+1/{\mathrm{poly}}(d)$ 因子内近似为 NP-难，解决了 Kanna 和 Vempala 提出的开放问题。
对于广泛类别的 $M$-估计器，可在 $O(\operatorname{nnz}(A)\log n + (n+d)\cdot{\mathrm{poly}}(K/\varepsilon))$ 时间内将其约化为大小为 ${\mathrm{poly}}(K\varepsilon^{-1}\log n)$ 的受约束实例，其中 $K = (\log n)^{O(\log k)}$。
首次给出了针对广泛类别的凸 $M$-估计器的 $(1+\varepsilon)$-近似回归的 $O(\operatorname{nnz}(A) + {\mathrm{poly}}(d/\varepsilon))$-时间算法，优于先前仅能实现 $O(1)$-近似的成果。
该难解性结果表明，除非 $P = NP$，否则任何运行时间在 $k$ 和 $1/\varepsilon$ 的多项式时间内的算法都无法实现 $(1+1/{\mathrm{poly}}(d))$-近似；结合先前工作，这确立了对所有 $p \neq 2$ 的 NP-难解性。
从团问题到鲁棒子空间逼近的约化表明，存在团与无团实例之间的代价差距为加法因子 $\Omega((1/B_1)^{p/2}/r^2)$，该差距在 $1+1/{\mathrm{poly}}(d)$ 近似下可被检测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。