QUICK REVIEW

[論文レビュー] A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

Sepehr Assadi, Michael Kapralov|arXiv (Cornell University)|Nov 19, 2018

Complexity and Algorithms in Graphs被引用数 31

ひとこと要約

本稿では、次数、近隣、ペア、エッジサンプルクエリを用いて、巨大グラフ $ G $ 内の任意の部分グラフ $ H $ の数を推定する単純なサブリニア時間アルゴリズムを提示する。$ (1\pm\varepsilon) $-近似を $ O^*\left(\frac{m^\rho(H)}{\#H}\right) $ 時間で達成し、三角形や完全グラフに対する最適な境界を一致させ、エッジサンプリングによりすべての部分グラフへ拡張し、$ n $-依存項を回避するという予想を解決する。

ABSTRACT

In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs $H$. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph $H$ in a graph $G$ with $m$ edges is $O(m^{\ ho(H)})$, where $\ ho(H)$ is the fractional edge-cover of $H$, and enumeration algorithms with matching runtime are known for any $H$. We bridge this gap between subgraph counting and subgraph enumeration by designing a sublinear-time algorithm that can estimate the number of any arbitrary subgraph $H$ in $G$, denoted by $\\#H$, to within a $(1\\pm \\epsilon)$-approximation w.h.p. in $O(\\frac{m^{\ ho(H)}}{\\#H}) \\cdot poly(\\log{n},1/\\epsilon)$ time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et.al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph $H$ under the additional assumption of edge-sample queries. We further show that our algorithm works for the more general database join size estimation problem and prove a matching lower bound for this problem.

研究の動機と目的

標準的なクエリアクセスのもとで、任意の部分グラフ $ H $ が大きなグラフ $ G $ に現れる回数を推定するサブリニア時間アルゴリズムを設計すること。
特定の族（三角形や完全グラフなど）に限らないサブリニアアルゴリズムへの拡張を通じて、部分グラフ列挙と部分グラフカウントのギャップを埋めること。
三角形や完全グラフに対する既知の境界と一致する最適なクエリ複雑性を達成し、エッジサンプルクエリを用いて $ n $-依存する加法的項を排除すること。
彩色部分グラフカウント推定の一般問題に対する一致する下界を確立し、クエリ複雑性の最適性を証明すること。

提案手法

アルゴリズムは、次数、近隣、ペア、エッジサンプルクエリを用い、エッジを一様にランダムにサンプリングし、頂点の局所的近傍を探索する。
部分グラフ $ H $ の分数的エッジカバー数 $ \rho(H) $ を活用し、これが部分グラフカウントの漸近的複雑性を決定する。
コアアイデアは、エッジサンプリングを用いて、ランダムに選ばれたエッジが $ H $-コピーに属する確率を推定することで、$ H $-コピーの数を効率的に推定することである。
これらの推定値をランダム化されたサンプリング戦略と組み合わせることで、$ G $ 内の $ H $-同型部分グラフ数 $ \#H $ の $ (1\pm\varepsilon) $-近似を計算する。
データベースのジョインサイズ推定問題をモデル化する彩色部分グラフへの一般化を導入し、この設定において一致する下界を証明する。
分析は確率的議論に依拠し、クエリ複雑性の下界を証明するための、慎重に構築されたグラフ分布を用いる。エッジサンプリングなしでは $ \Omega(m) $ クエリが必要であることが示される。

実験結果

リサーチクエスチョン

RQ1サブリニア時間アルゴリズムは、三角形や完全グラフといった特定のケースを超えて、グラフ $ G $ 内の任意の部分グラフ $ H $ の数を推定できるか？
RQ2エッジサンプルクエリの導入により、クエリ複雑性における $ n $-依存加法的項を回避できる最適なサブリニアアルゴリズムが得られるか？
RQ3エッジサンプリングが存在する状況でも、$ O^*\left(\frac{m^{\rho(H)}}{\#H}\right) $ のクエリ複雑性が部分グラフカウントに対してタイトであるか？
RQ4このフレームワークは、データベースにおける自然なジョインサイズ推定に対応する彩色部分グラフの数推定というより一般的な問題へ拡張可能か？

主な発見

提案されたアルゴリズムは、$ O^*\left(\min\left\{m, \frac{m^{\rho(H)}}{\#H}\right\} \right) $ クエリと $ O^*\left(\frac{m^{\rho(H)}}{\#H}\right) $ 時間を用いて、$ G $ 内の $ H $-コピー数の $ (1\pm\varepsilon) $-近似を達成する。
$ k $-完全グラフの場合、Eden らの最良の既知の境界を一致させるが、エッジサンプルクエリを用いることで、彼らの $ O^*\left(\frac{n}{(\#K_k)^{1/k}}\right) $ 加法的項を回避する。
アルゴリズムのクエリ複雑性は、$ \Omega\left(\frac{m^{\rho(H)}}{\#H}\right) $ の一致する下界により、多項式対数因子を除いて最適であることが示される。
下界構成では、$ m^{\rho(H)-1} $ 個の彩色部分グラフ $ H $-コピーの有無の違いに起因する二つのグラフ分布 $ \mathcal{G}_0 $ と $ \mathcal{G}_1 $ を用い、エッジサンプリングなしでは $ \Omega(m) $ クエリが必要であることを証明する。
Eden と Rosenbaum の予想を解決し、エッジサンプルクエリにより、部分グラフカウントのクエリ複雑性における $ n $-依存項の必要性が排除されることを示した。
このフレームワークは、彩色部分グラフを通じてデータベースジョインサイズ推定問題へ一般化可能であり、下界はこのより広範な設定でも成り立つため、最適性が証明される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。