QUICK REVIEW

[论文解读] Hyperedge Estimation using Polylogarithmic Subset Queries

Rashtchian, Cyrus, Woodruff, David P.|arXiv (Cornell University)|Aug 12, 2019

Complexity and Algorithms in Graphs参考文献 17被引用 3

一句话总结

本论文提出了一种随机化算法，通过使用广义d部独立集（GPIS）预言机的亚对数查询，估计d-均匀超图中的超边数量。该算法以高概率实现对真实超边数的(1±ϵ)-近似，当d为常数时，查询次数为O_d(log^{5d+5} n / ϵ^4)次GPIS查询，从而将先前关于图边和三角形估计的工作扩展至超图，并实现了高效的查询复杂度。

ABSTRACT

In this work, we estimate the number of hyperedges in a hypergraph ${\cal H}(U({\cal H}), {\cal F}({\cal H}))$, where $U({\cal H})$ denotes the set of vertices and ${\cal F}({\cal H}))$ denotes the set of hyperedges. We assume a query oracle access to the hypergraph ${\cal H}$. Estimating the number of edges, triangles or small subgraphs in a graph is a well studied problem. Beame \etal~and Bhattacharya \etal~gave algorithms to estimate the number of edges and triangles in a graph using queries to the {\sc Bipartite Independent Set} ({\sc BIS}) and the {\sc Tripartite Independent Set} ({\sc TIS}) oracles, respectively. We generalize the earlier works by estimating the number of hyperedges using a query oracle, known as the {\bf Generalized $d$-partite independent set oracle ({\sc GPIS})}, that takes $d$ (non-empty) pairwise disjoint subsets of vertices $A_1,\ldots,A_d \subseteq U({\cal H})$ as input, and answers whether there exists a hyperedge in ${\cal H}$ having (exactly) one vertex in each $A_i, i \in \{1,2,\ldots,d\}$. We give a randomized algorithm for the hyperedge estimation problem using the {\sc GPIS} query oracle to output $\widehat{m}$ for $m({\cal H})$ satisfying $(1-ε) \cdot m({\cal H}) \leq \widehat{m} \leq (1+ε) \cdot m({\cal H})$. The number of queries made by our algorithm, assuming $d$ to be a constant, is polylogarithmic in the number of vertices of the hypergraph.

研究动机与目标

将先前关于子线性图估计（如边、三角形）的工作扩展至超图，采用广义查询模型。
解决超边估计的查询复杂度是否可实现为亚对数形式，且与共享顶点的超边数量无关。
形式化并分析一种新的查询预言机——广义d部独立集（GPIS）预言机，用于d-均匀超图。
设计一种递归的、迭代的估计算法，结合粗略估计与稀疏化处理，以实现高概率的(1±ϵ)-近似。
建立查询复杂度界限，使其在n上为亚对数级，在ϵ上为多项式反比，且在d上仅作为常数因子依赖。

提出的方法

该算法使用递归估计框架，维护一个元组(A1,…,Ad,w)的数据结构，其中Ai为互不相交的顶点子集，w为权重。
通过GPIS1查询进行粗略估计，以高概率估计每个元组(A1,…,Ad)所相交的超边数量。
通过迭代稀疏化减少活跃元组的数量，确保总估计的超边数始终在真实值的(1±ϵ)范围内。
算法在粗略估计（使用GPIS1查询）与稀疏化（使用GPIS2查询）之间交替进行，每一步通过Chernoff型不等式保持浓度界限。
利用对d的归纳法，将BIS和TIS预言机推广至GPIS预言机，以捕捉d-均匀超边的横截结构。
分析中使用概率界，确保所有粗略估计在高概率下同时成功，依赖于O(log^{4d} n / ϵ^2)个元组上的并集界。

实验结果

研究问题

RQ1是否可以仅使用广义预言机的亚对数查询来解决d-均匀超图中的超边估计问题？
RQ2在先前模型中，对共享d−1个顶点的超边数量的依赖是否为固有属性，或可被避免？
RQ3BIS和TIS预言机框架能否推广至d部设置，同时保持亚对数查询复杂度？
RQ4使用GPIS预言机进行超边估计的查询复杂度是多少？其随d、n和ϵ的缩放关系如何？
RQ5结合粗略估计与稀疏化的递归估计策略能否实现高概率的(1±ϵ)-近似？

主要发现

该算法以高概率实现对超边数m(H)的(1±ϵ)-近似。
使用的GPIS查询总数为O_d(log^{5d+5} n / ϵ^4)，当d为常数时，该值在n上为亚对数级。
查询复杂度在d上作为常数因子依赖，log n的指数为O(d)，ϵ的指数为绝对常数。
成功概率至少为1 − 1/n^{4d}，确保所有估计步骤均具有高置信度。
该算法在任何时刻维护的元组数不超过O_d(log^{4d} n / ϵ^2)，并通过稀疏化与粗略估计实现迭代优化。
分析表明，估计误差在i ≤ 2d log n次迭代内被限制在(1±λ)^i以内，其中λ = ϵ/(4d log n)，最终得到(1±ϵ)的近似结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。