Skip to main content
QUICK REVIEW

[Paper Review] A nonparametric two-sample hypothesis testing problem for random dot product graphs

Minh Tang, Avanti Athreya|arXiv (Cornell University)|Sep 8, 2014
Complex Network Analysis Techniques68 references29 citations
TL;DR

This paper proposes a nonparametric two-sample hypothesis test for random dot product graphs (RDPGs) to determine if two graphs have the same or related latent position distributions. Using adjacency spectral embedding to estimate latent positions, the method constructs a kernel-based test statistic that consistently detects differences under broad alternatives, with theoretical guarantees via a novel concentration inequality for empirical processes in the estimated setting.

ABSTRACT

We consider the problem of testing whether two finite-dimensional random dot product graphs have generating latent positions that are independently drawn from the same distribution, or distributions that are related via scaling or projection. We propose a test statistic that is a kernel-based function of the adjacency spectral embedding for each graph. We obtain a limiting distribution for our test statistic under the null and we show that our test procedure is consistent across a broad range of alternatives.

Motivation & Objective

  • To develop a nonparametric two-sample hypothesis test for random dot product graphs when the true latent positions are unobserved.
  • To test whether two independent RDPGs have generating latent positions drawn from the same distribution or related via scaling or projection.
  • To establish consistency of the test under general alternatives by leveraging estimated latent positions from adjacency spectral embedding.
  • To address the challenge of inference in the absence of known vertex correspondence or fixed vertex sets, extending beyond semiparametric settings.

Proposed method

  • The test uses adjacency spectral embedding (ASE) to estimate latent positions from the observed adjacency matrices of two RDPGs.
  • A kernel-based test statistic is constructed as an empirical estimate of the maximum mean discrepancy (MMD) between the estimated latent position distributions.
  • The method relies on a novel concentration inequality for the supremum of an empirical process in the estimated latent positions setting to prove consistency.
  • The test statistic converges in probability to the true MMD computed from the unknown true latent positions, ensuring asymptotic validity.
  • The approach is robust to unknown sparsity factors when testing for equality up to scaling, as shown in the sparse regime.
  • The framework is adaptable to goodness-of-fit testing and can be extended to independence testing in latent position models.

Experimental results

Research questions

  • RQ1Can a nonparametric two-sample test be consistently applied to RDPGs when the true latent positions are unobserved and vertex correspondence is unknown?
  • RQ2How does the performance of a test based on estimated latent positions compare to one based on true latent positions in terms of consistency?
  • RQ3What conditions on sparsity and distributional assumptions ensure the test remains consistent in the dense and sparse graph regimes?
  • RQ4Can the proposed test detect alternatives involving scaling or projection of latent position distributions?
  • RQ5What is the rate of convergence of the test statistic to the true MMD under estimation error from ASE?

Key findings

  • The test statistic constructed from adjacency spectral embeddings converges in probability to the true MMD computed from the unknown true latent positions, ensuring consistency.
  • The proposed test is consistent against any alternative distributional difference, provided the latent positions satisfy the assumptions of the model and the graphs are dense.
  • A novel concentration inequality for the supremum of an empirical process in the estimated latent positions setting is established, forming the core theoretical foundation.
  • In the sparse regime, the test remains consistent as long as the sparsity factors do not decay too quickly, specifically when $ n\alpha_n = \omega(\log^4 n) $ and $ m\beta_m = \omega(\log^4 m) $.
  • The test can be adapted to goodness-of-fit testing and is robust to unknown sparsity factors when testing for equality up to scaling.
  • The method outperforms semiparametric approaches in settings without known vertex alignment, though it may have less power than those when alignment is available.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.