[論文レビュー] Beyond BFS: A Comparative Study of Rooted Spanning Tree Algorithms on GPUs
The paper compares BFS, connectivity-based methods, and PR-RST for GPU-rooted spanning trees, finding GConn with Eulerian Tour often yields the best performance, up to 300x faster than optimized BFS on high-diameter graphs, challenging BFS as the default approach.
Rooted spanning trees (RSTs) are a core primitive in parallel graph analytics, underpinning algorithms such as biconnected components and planarity testing. On GPUs, RST construction has traditionally relied on breadth-first search (BFS) due to its simplicity and work efficiency. However, BFS incurs an O(D) step complexity, which severely limits parallelism on high-diameter and power-law graphs. We present a comparative study of alternative RST construction strategies on modern GPUs. We introduce a GPU adaptation of the Path Reversal RST (PR-RST) algorithm, optimizing its pointer-jumping and broadcast operations for modern GPU architecture. In addition, we evaluate an integrated approach that combines a state-of-the-art connectivity framework (GConn) with Eulerian tour-based rooting. Across more than 10 real-world graphs, our results show that the GConn-based approach achieves up to 300x speedup over optimized BFS on high-diameter graphs. These findings indicate that the O(log n) step complexity of connectivity-based methods can outweigh their structural overhead on modern hardware, motivating a rethinking of RST construction in GPU graph analytics.
研究の動機と目的
- Motivate a rethinking of RST construction on GPUs beyond BFS by exploiting connectivity-based methods.
- Evaluate PR-RST as a GPU-adapted, depth-efficient alternative.
- Investigate two-phase (connectivity plus Euler tour) rooting for efficiency on modern GPUs.
- Quantify performance across real-world graphs and analyze diameter effects on RST methods.
提案手法
- Implement GPU adaptations of BFS, connectivity-based RST (GConn) and PR-RST from Cong and Bader.
- Evaluate an integrated GConn plus Eulerian tour rooting approach.
- Adapt Eulerian Tour to handle disconnected forests and optimize with modern GPU libraries (CUDA, CUB).
- Use pointer-jumping, hooking variants, and path reversal optimizations to maintain data-parallelism.
- Measure performance on 30+ real-world graphs on an NVIDIA L40s GPU with multiple trials and median timing.
実験結果
リサーチクエスチョン
- RQ1Which RST construction strategy (BFS, connectivity-based, PR-RST) yields the best GPU performance across real-world graphs?
- RQ2How does graph diameter affect the performance and depth of the resulting rooted spanning trees?
- RQ3Does the Euler Tour rooting overhead pay off on modern GPUs when combined with connectivity-based approaches?
- RQ4What are the structural implications (tree depth) of different RST methods for downstream graph analytics?
主な発見
- GConn-based RST with Eulerian Tour achieves up to 300x speedup over optimized BFS on high-diameter graphs.
- Connectivity-based methods maintain near-constant performance across varying graph diameters, unlike BFS.
- GConn-based RST tends to produce deeper trees than BFS, indicating a depth–performance trade-off.
- Path-reversal RST and PR-RST adaptations for GPUs can be effective, with careful pointer-jumping and on-path marking.
- Euler Tour techniques, when modernized (using CUDA/CUB), remain viable and beneficial on GPUs for rooting forests.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。