QUICK REVIEW

[论文解读] Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees

Vitaly Feldman, Pravesh K. Kothari|arXiv (Cornell University)|Apr 2, 2013

Complexity and Algorithms in Graphs被引用 22

一句话总结

该论文证明了在布尔超立方体上任意的子模函数均可通过深度为 $O(1/\epsilon^2)$ 的实值决策树以 $\ell_2$-范数实现 $\epsilon$-近似，从而导出运行时间为 $\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^4)}$ 的高效学习算法。该结果首次为在均匀分布下学习子模函数提供了紧致的信息论下界与计算下界。

ABSTRACT

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube $\{0,1\}^n$. Our main result is the following structural theorem: any submodular function is $ε$-close in $\ell_2$ to a real-valued decision tree (DT) of depth $O(1/ε^2)$. This immediately implies that any submodular function is $ε$-close to a function of at most $2^{O(1/ε^2)}$ variables and has a spectral $\ell_1$ norm of $2^{O(1/ε^2)}$. It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree $O(1/ε^2)$ (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank $4/ε^2$ and a proof that any rank-$r$ DT can be $ε$-approximated by a DT of depth $\frac{5}{2}(r+\log(1/ε))$. We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time $ ilde{O}(n^2) \cdot 2^{O(1/ε^{4})}$. The best previous algorithm for the problem requires $n^{O(1/ε^{2})}$ time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of $2^{Ω(1/ε^{2/3})}$ on the complexity of learning monotone submodular functions in any reasonable model; (2) computational lower bound of $n^{Ω(1/ε^{2/3})}$ based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

研究动机与目标

通过低秩决策树对子模函数进行结构表征，以实现高效近似与学习。
设计一种在均匀分布下对子模函数进行属性高效 PAC 学习的算法，其运行时间优于以往工作。
首次建立学习子模函数的信息论下界与计算下界，证明所提算法的最优性。
通过利用决策树的秩与谱性质，弥合近似理论与学习算法之间的差距。

提出的方法

作者引入一种分解技术，将子模函数表示为低秩决策树，采用基于秩的复杂度度量。
证明了任意子模函数在 $\ell_2$-范数下与秩至多为 $4/\epsilon^2$ 的决策树 $\epsilon$-接近。
关键技术组件是证明任意秩为 $r$ 的决策树可由深度为 $\frac{5}{2}(r + \log(1/\epsilon))$ 的决策树实现 $\epsilon$-近似。
学习算法利用该结构结果，通过采样与阈值化构建假设函数，实现 $\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^4)}$ 的运行时间。
通过归约至学习带噪声的稀疏奇偶函数（一个广泛认为困难的问题）推导出下界。
通过从任意布尔函数构造单调子模函数，实现归约以建立下界。

实验结果

研究问题

RQ1子模函数能否通过低秩决策树实现高效近似？
RQ2为在 $\ell_2$-范数下 $\epsilon$-近似任意子模函数，所需决策树的最小深度或秩是多少？
RQ3此类近似能否导致子模函数更高效的 PAC 学习算法？
RQ4在均匀分布下学习子模函数是否存在固有局限性？
RQ5能否在查询复杂度或运行时间方面为学习子模函数建立紧致下界？

主要发现

任意子模函数在 $\ell_2$-范数下与深度为 $O(1/\epsilon^2)$ 的实值决策树 $\epsilon$-接近。
该结果同样表明，子模函数可由最多 $2^{O(1/\epsilon^2)}$ 个变量的函数近似，且其谱 $\ell_1$-范数有界于 $2^{O(1/\epsilon^2)}$。
所提出的 PAC 学习算法运行时间为 $\tilde{O}(n^2) \cdot 2^{O(1/\epsilon^4)}$，优于先前的 $n^{O(1/\epsilon^2)}$ 上界。
为学习单调子模函数，建立了信息论下界 $2^{\Omega(\epsilon^{-2/3})}$ 次值查询。
通过归约至学习带噪声的稀疏奇偶函数，证明了计算下界 $n^{\Omega(\epsilon^{-2/3})}$。
该论文首次为在均匀分布下学习子模函数提供了下界，表明所提算法近乎最优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。