QUICK REVIEW

[Paper Review] Dynamic PageRank: Algorithms and Lower Bounds

Jayaram, Rajesh, Łącki, Jakub|arXiv (Cornell University)|Apr 17, 2013

Web Data Mining and Analysis12 references26 citations

TL;DR

This paper presents a novel, efficient algorithm for computing personalized PageRank to a single target node from all sources in a graph, using a priority queue to propagate updates from the target backward. It achieves near-optimal performance—comparable to single-source PageRank computation—by focusing only on high-impact nodes, with theoretical bounds and empirical validation on the Twitter graph showing up to 1,700x speedup over power iteration for moderate error tolerances.

ABSTRACT

Personalalized PageRank uses random walks to determine the importance or authority of nodes in a graph from the point of view of a given source node. Much past work has considered how to compute personalized PageRank from a given source node to other nodes. In this work we consider the problem of computing personalized PageRanks to a given target node from all source nodes. This problem can be interpreted as finding who supports the target or who is interested in the target. We present an efficient algorithm for computing personalized PageRank to a given target up to any given accuracy. We give a simple analysis of our algorithm's running time in both the average case and the parameterized worst-case. We show that for any graph with $n$ nodes and $m$ edges, if the target node is randomly chosen and the teleport probability $α$ is given, the algorithm will compute a result with $ε$ error in time $O\left(\frac{1}{αε} \left(\frac{m}{n} + \log(n) ight) ight)$. This is much faster than the previously proposed method of computing personalized PageRank separately from every source node, and it is comparable to the cost of computing personalized PageRank from a single source. We present results from experiments on the Twitter graph which show that the constant factors in our running time analysis are small and our algorithm is efficient in practice.

Motivation & Objective

To address the problem of efficiently computing personalized PageRank to a single target node from all source nodes, rather than from a single source to all targets.
To design an algorithm that avoids the O(n) cost of computing PageRank from every source node, especially when only a few sources have high relevance to the target.
To provide theoretical running time bounds that depend on graph structure and desired accuracy, with graceful degradation as error tolerance decreases.
To validate the algorithm’s efficiency and accuracy empirically on a large-scale social network graph (Twitter).

Proposed method

The algorithm starts at the target node v and propagates updated PageRank estimates backward along incoming edges using a priority queue.
At each step, the node with the largest unpropagated estimate change is selected for update, ensuring that the most significant contributions are processed first.
The method uses a contraction mapping property and a priority queue to maintain additive error ϵ, with updates governed by the equation π(u,v) = α + (1−α)∑_{w→u} π(w,v)/|out(w)|.
Theoretical analysis provides two bounds: one for random targets (O(1/αϵ ⋅ (m/n + log n))) and one for arbitrary targets (O(Dv(αϵ)/α ⋅ log(1/ϵα))), where Dv(αϵ) captures the difficulty of the problem.
The algorithm is implemented and tested on a 5.3M-node, 380M-edge subset of the Twitter graph to evaluate real-world performance.

Experimental results

Research questions

RQ1Can personalized PageRank to a single target node be computed more efficiently than by computing it from every source node?
RQ2Does a backward propagation strategy using a priority queue yield better running time than standard Monte Carlo or power iteration methods?
RQ3How does the running time scale with the desired accuracy ϵ, and does it degrade gracefully as ϵ → 0?
RQ4Is the theoretical parameter Dv(αϵ) a good predictor of actual running time in practice?

Key findings

For a randomly chosen target node, the algorithm runs in O(1/αϵ ⋅ (m/n + log n)) time, which is comparable to the cost of computing personalized PageRank from a single source.
For any target node, the algorithm runs in O(Dv(αϵ)/α ⋅ log(1/ϵα)) time, with Dv(αϵ) = ∑_{u:π(u,v)>αϵ} (|IN(u)| + log n), showing a favorable ϵ dependence of O(log(1/ϵ)) rather than O(1/ϵ²).
On the Twitter graph, with α=0.1 and ϵ=10⁻⁵, the algorithm took 1.2 seconds on average, while power iteration took 410 seconds, demonstrating a 340x speedup.
For ϵ=10⁻⁴, the algorithm was 1,700x faster than power iteration, which required 87 iterations to achieve the same error bound.
Empirical error was often 85% of the theoretical bound (ϵ), indicating the error analysis is tight and not overly conservative.
The ratio of actual steps to Dv(αϵ) was on average less than 4, far below the theoretical upper bound of 200, showing Dv(αϵ) is an excellent predictor of performance in practice.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.