Skip to main content
QUICK REVIEW

[Paper Review] Fast Incremental and Personalized PageRank

Bahman Bahmani, Abdur Chowdhury|arXiv (Cornell University)|Jun 15, 2010
Complex Network Analysis Techniques4 references27 citations
TL;DR

This paper proposes a fast, incremental Monte Carlo method for computing global and personalized PageRank in large-scale, dynamically evolving social networks using distributed shared memory. By storing random walk segments and leveraging power-law properties of personalized PageRank, the method achieves O(n ln m / ǫ²) total work for global PageRank and O(k / (R(1−α)/α)) expected database fetches for top-k personalized results, significantly outperforming batch re-computation and prior incremental approaches.

ABSTRACT

In this paper, we analyze the efficiency of Monte Carlo methods for incremental computation of PageRank, personalized PageRank, and similar random walk based methods (with focus on SALSA), on large-scale dynamically evolving social networks. We assume that the graph of friendships is stored in distributed shared memory, as is the case for large social networks such as Twitter. For global PageRank, we assume that the social network has $n$ nodes, and $m$ adversarially chosen edges arrive in a random order. We show that with a reset probability of $ε$, the total work needed to maintain an accurate estimate (using the Monte Carlo method) of the PageRank of every node at all times is $O(\frac{n\ln m}{ε^{2}})$. This is significantly better than all known bounds for incremental PageRank. For instance, if we naively recompute the PageRanks as each edge arrives, the simple power iteration method needs $Ω(\frac{m^2}{\ln(1/(1-ε))})$ total time and the Monte Carlo method needs $O(mn/ε)$ total time; both are prohibitively expensive. Furthermore, we also show that we can handle deletions equally efficiently. We then study the computation of the top $k$ personalized PageRanks starting from a seed node, assuming that personalized PageRanks follow a power-law with exponent $α< 1$. We show that if we store $R>q\ln n$ random walks starting from every node for large enough constant $q$ (using the approach outlined for global PageRank), then the expected number of calls made to the distributed social network database is $O(k/(R^{(1-α)/α}))$. We also present experimental results from the social networking site, Twitter, verifying our assumptions and analyses. The overall result is that this algorithm is fast enough for real-time queries over a dynamic social network.

Motivation & Objective

  • To address the inefficiency of batch re-computation for PageRank in dynamic social networks where edges arrive incrementally.
  • To design a scalable, real-time algorithm for maintaining accurate global and personalized PageRank estimates under continuous graph updates.
  • To leverage the power-law structure of personalized PageRank vectors to minimize expensive database fetches during random walk composition.
  • To validate the theoretical bounds with real-world experiments on Twitter data, confirming the method's practicality for production systems.

Proposed method

  • Uses Monte Carlo sampling with R random walk segments stored per node to enable fast incremental updates to PageRank and personalized PageRank.
  • Employs a distributed shared memory model (Social Store) to support low-latency random access to graph edges during walk simulation.
  • Applies the power-law assumption on personalized PageRank vectors (with exponent α < 1) to bound the expected number of database fetches during walk composition.
  • Uses geometrically distributed walk lengths with mean 1/ǫ to simulate random surfer behavior and estimate stationary distributions.
  • Derives theoretical bounds on total work for global PageRank and expected number of fetches for top-k personalized results using concentration inequalities and power-law analysis.
  • Employs a segment-based walk composition technique: when a walk is needed, fetch pre-stored segments and stitch them together to form a full walk.

Experimental results

Research questions

  • RQ1Can Monte Carlo methods be made efficient for incremental PageRank computation in large-scale, dynamic social networks?
  • RQ2What is the theoretical total work required to maintain accurate global PageRank estimates under adversarial edge arrivals in random order?
  • RQ3How can personalized PageRank be computed efficiently for top-k recommendations with minimal database access?
  • RQ4To what extent do personalized PageRank vectors follow a power-law distribution in real social networks?
  • RQ5Can short random walks approximate the stationary distribution well enough for practical recommendation systems?

Key findings

  • The total work for maintaining global PageRank with reset probability ǫ is O(n ln m / ǫ²), which is significantly better than the Ω(m² / ln(1/(1−ǫ))) of power iteration and O(mn / ǫ) of naive Monte Carlo re-computation.
  • The method handles edge deletions with the same efficiency as insertions, maintaining the same theoretical bounds.
  • For top-k personalized PageRank with power-law exponent α < 1, the expected number of database fetches is O(k / (R(1−α)/α)), where R is the number of stored walk segments per node.
  • Experiments on Twitter data confirm that personalized PageRank vectors follow a power-law with mean exponent 0.77 and standard deviation 0.08, validating the model assumptions.
  • Short random walks of 5,000 steps recover 80% of the top 100 true results within the top 100 recommendations, with precision at recall 0.8 reaching nearly 0.8.
  • Theoretical bounds on fetch counts closely match experimental results, confirming that R > q ln n ensures robust performance even before the theoretical threshold is reached.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.