QUICK REVIEW

[论文解读] A Granularity Characterization of Task Scheduling Effectiveness

Sana Taghipour Anvar, David Kaeli|arXiv (Cornell University)|Feb 24, 2026

Distributed and Parallel Computing Systems被引用 0

一句话总结

论文将任务调度开销与任务图依赖拓扑联系起来，提出一个粒度度量，可预测强扩展极限并指导动态与静态执行的选择。

ABSTRACT

Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. As parallelism increases, scheduling overhead may transition from negligible to dominant, leading to rapid drops in performance for some algorithms, while remaining negligible for others. Although such effects are widely observed empirically, there is a general lack of understanding how algorithmic structure impacts whether dynamic scheduling is always beneficial. In this work, we introduce a granularity characterization framework that directly links scheduling overhead growth to task-graph dependency topology. We show that dependency structure, rather than problem size alone, governs how overhead scales with parallelism. Based on this observation, we characterize execution behavior using a simple granularity measure that indicates when scheduling overhead can be amortized by parallel computation and when scheduling overhead dominates performance. Through experimental evaluation on representative parallel workloads with diverse dependency patterns, we demonstrate that the proposed characterization explains both gradual and abrupt strong-scaling breakdowns observed in practice. We further show that overhead models derived from dependency topology accurately predict strong-scaling limits and enable a practical runtime decision rule for selecting dynamic or static execution without requiring exhaustive strong-scaling studies or extensive offline tuning.

研究动机与目标

解释调度开销如何通过任务图依赖拓扑而非单纯问题规模来与并行度共同扩张。
提出一个简单的粒度度量，指示何时调度开销被并行计算所摊销。
开发一个由拓扑驱动的开销模型与统一框架，以预测不同工作负载下的强扩展行为。

提出的方法

通过依赖邻域定义依赖拓扑，并将其分类为全局、局部和独立模式。
推导粒度数 G = T_kernel / ((1-ρ) k τ_s)，将核工作量与调度开销关联起来。
把调度开销 T_overhead 建模为与秩数 P 相关的拓扑特定形式（如全局为 αP^2+β，局部为 αP+β，独立为 β）。
证明 T_overhead 随任务间依赖边数 |E(P)| 增长，并推导每种拓扑的 G 增长规律（G_global ~ P^-3, G_local ~ P^-2, G_independent ~ P^-1）。
给出一个与工作负载无关的开销–粒度关系：Ω% = 100/(G+1)，并将执行分为有利、边缘和不利三种情形。
在 FFT、stencil、sweep、GEMM 及额外工作负载上进行标定和验证，以预测强扩展极限及动态–静态交叉点。

实验结果

研究问题

RQ1不同依赖拓扑下，调度开销与并行度的关系如何？
RQ2一个简单的粒度指标是否能捕捉到跨工作负载的动态调度有益与否的转变？
RQ3基于拓扑的开销模型在预测强扩展极限及动态与静态执行的拐点方面有多准确？

主要发现

调度开销的增长取决于依赖拓扑，而不仅仅是问题规模或任务数量。
一个简单的粒度数 G 能在多样化工作负载中统一调度行为，将数据压缩到一个与工作负载无关的曲线。
全局依赖在强扩展下导致 G 的快速（三次方）衰减，而局部和独立模式呈现较慢的衰减（平方或线性）。
开销分数 Ω% 为 100/(G+1)，为何时采用动态调度提供实用阈值（G>10 有利，1<G≤10 边缘，有利不足，G≤1 不利）。
对 FFT、stencil、sweep、GEMM、SpMV、Conv2D、PageRank 和 N-Body 的标定表明，该模型能够预测不同依赖类的强扩展极限与拐点（P*）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。