QUICK REVIEW

[论文解读] Nearly Optimal Bounds for Sample-Based Testing and Learning of $k$-Monotone Functions

Hadley Black|arXiv (Cornell University)|Oct 18, 2023

Machine Learning and Algorithms被引用 1

一句话总结

该论文为超立方体和连续乘积空间上的 $k$-谬和函数的测试与学习建立了近乎紧致的样本复杂度界限。它证明了函数 $f: \{0,1\}^d \to [r]$ 的 $k$-谬和性测试与学习的下界为 $\exp(\Omega(\min\{\frac{rk}{\varepsilon}\sqrt{d}, d\}))$，在指数部分的对数因子范围内与已知上界匹配，并将这些结果扩展到在乘积分布下的 $\mathbb{R}^d$，实现了更高的样本效率。

ABSTRACT

We study monotonicity testing of functions $f \colon \{0,1\}^d o \{0,1\}$ using sample-based algorithms, which are only allowed to observe the value of $f$ on points drawn independently from the uniform distribution. A classic result by Bshouty-Tamon (J. ACM 1996) proved that monotone functions can be learned with $\exp(\widetilde{O}(\min\{\frac{1}{\varepsilon}\sqrt{d},d\}))$ samples and it is not hard to show that this bound extends to testing. Prior to our work the only lower bound for this problem was $Ω(\sqrt{\exp(d)/\varepsilon})$ in the small $\varepsilon$ parameter regime, when $\varepsilon = O(d^{-3/2})$, due to Goldreich-Goldwasser-Lehman-Ron-Samorodnitsky (Combinatorica 2000). Thus, the sample complexity of monotonicity testing was wide open for $\varepsilon \gg d^{-3/2}$. We resolve this question, obtaining a nearly tight lower bound of $\exp(Ω(\min\{\frac{1}{\varepsilon}\sqrt{d},d\}))$ for all $\varepsilon$ at most a sufficiently small constant. In fact, we prove a much more general result, showing that the sample complexity of $k$-monotonicity testing and learning for functions $f \colon \{0,1\}^d o [r]$ is $\exp(Ω(\min\{\frac{rk}{\varepsilon}\sqrt{d},d\}))$. For testing with one-sided error we show that the sample complexity is $\exp(Θ(d))$. Beyond the hypercube, we prove nearly tight bounds (up to polylog factors of $d,k,r,1/\varepsilon$ in the exponent) of $\exp(\widetildeΘ(\min\{\frac{rk}{\varepsilon}\sqrt{d},d\}))$ on the sample complexity of testing and learning measurable $k$-monotone functions $f \colon \mathbb{R}^d o [r]$ under product distributions. Our upper bound improves upon the previous bound of $\exp(\widetilde{O}(\min\{\frac{k}{\varepsilon^2}\sqrt{d},d\}))$ by Harms-Yoshida (ICALP 2022) for Boolean functions ($r=2$).

研究动机与目标

为基于样本的模型中单调性测试与学习的样本复杂度界限填补空白，特别是针对 $\varepsilon \gg d^{-3/2}$ 的情形。
为函数 $f: \{0,1\}^d \to [r]$ 的 $k$-谬和性测试与学习建立近乎最优的下界。
将这些界限扩展到连续乘积空间，为 $\mathbb{R}^d$ 上可测的 $k$-谬和函数的测试与学习提供近乎紧致的样本复杂度。
改进在乘积分布下 $k$-谬和函数学习算法的样本复杂度，使其与新下界在多对数因子范围内匹配。

提出的方法

通过构造两个分布 $D_{\text{yes}}$ 和 $D_{\text{no}}$，利用均匀采样下的不可区分性来证明下界。
基于样本对的并集界应用概率论证，表明任何单边测试器必须使用 $\exp(\Omega(d))$ 个样本才能检测非单调性。
应用优惠券收集论证，为单调性单边误差测试建立 $\exp(\Omega(d))$ 的样本复杂度下界。
通过下采样将 $\mathbb{R}^d$ 上的学习问题约化为超网格上的学习问题，从而能够使用离散技术。
采用基于经验风险最小化的学习算法，其假设类具有有界 VC 维，从而在高概率下保证泛化能力。
使用基于学习的测试框架：将使用 $s(\varepsilon/4)$ 个样本的学习算法转换为使用 $s(\varepsilon/4) + O(1/\varepsilon^2)$ 个样本的测试器。

实验结果

研究问题

RQ1在基于样本的模型中，函数 $f: \{0,1\}^d \to [r]$ 的 $k$-谬和性测试的最优样本复杂度是多少？
RQ2能否为 $k$-谬和性测试与学习建立近乎紧致的下界，使其与现有上界匹配？
RQ3在连续乘积空间（如 $\mathbb{R}^d$）中，$k$-谬和函数的学习与测试的样本复杂度如何缩放？
RQ4能否将 $\mathbb{R}^d$ 上 $k$-谬和函数学习的样本复杂度改进至与新下界匹配？

主要发现

该论文为函数 $f: \{0,1\}^d \to [r]$ 的 $k$-谬和性测试与学习建立了近乎紧致的下界 $\exp(\Omega(\min\{\frac{rk}{\varepsilon}\sqrt{d}, d\}))$。
对于单边误差的单调性测试（$k=1$，$r=2$），样本复杂度为 $\exp(\Omega(d))$，其紧致性在对数因子范围内成立。
$\mathbb{R}^d$ 上在乘积分布下 $k$-谬和函数的学习上界为 $\exp(\widetilde{O}(\min\{\frac{rk}{\varepsilon}\sqrt{d}, d\}))$，与下界在指数部分的多对数因子范围内匹配。
$\mathbb{R}^d$ 上改进的学习算法与新的基于样本的测试下界相匹配，解决了连续设置中长期存在的样本复杂度差距。
该研究填补了在 $\varepsilon \gg d^{-3/2}$ 范围内单调性测试样本复杂度的空白，此前该范围的界限尚不明确。
该论文提出了一种基于样本测试中证明下界的通用框架，通过构造与 $k$-谬和性相距甚远但不可区分的分布来实现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。