QUICK REVIEW

[论文解读] Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process Workloads

Aleix Roca, Vicenç Beltran|arXiv (Cornell University)|Jan 28, 2026

Parallel Computing and Optimization Techniques被引用 0

一句话总结

本文介绍 USF（用户空间调度框架）和 SCHED_COOP（通过延迟抢占来降低超订阅干扰的协作策略），在多运行时和多进程的 HPC/AI workload 上进行评估。

ABSTRACT

The convergence of high-performance computing (HPC) and artificial intelligence (AI) is driving the emergence of increasingly complex parallel applications and workloads. These workloads often combine multiple parallel runtimes within the same application or across co-located jobs, creating scheduling demands that place significant stress on traditional OS schedulers. When oversubscribed (there are more ready threads than cores), OS schedulers rely on periodic preemptions to multiplex cores, often introducing interference that may degrade performance. In this paper, we present: (1) The User-space Scheduling Framework (USF), a novel seamless process scheduling framework completely implemented in user-space. USF enables users to implement their own process scheduling algorithms without requiring special permissions. We evaluate USF with its default cooperative policy, (2) SCHED_COOP, designed to reduce interference by switching threads only upon blocking. This approach mitigates well-known issues such as Lock-Holder Preemption (LHP), Lock-Waiter Preemption (LWP), and scalability collapse. We implement USF and SCHED_COOP by extending the GNU C library with the nOS-V runtime, enabling seamless coordination across multiple runtimes (e.g., OpenMP) without requiring invasive application changes. Evaluations show gains up to 2.4x in oversubscribed multi-process scenarios, including nested BLAS workloads, multi-process PyTorch inference with LLaMA-3, and Molecular Dynamics (MD) simulations.

研究动机与目标

为在具多运行时且并列进程的超订阅 HPC/AI workload 中提升调度需求提供动机。
提出一个无需应用变更或特殊权限的无缝用户空间调度框架（USF）。
引入 SCHED_COOP 作为协作策略，以最小化抢占与干扰。

提出的方法

通过扩展 GNU C Library (glibc) 与 nOS-V 运行时实现 USF 和 SCHED_COOP，以协调多个运行时与进程。
将 pthreads 转换为 nOS-V 工作节点并映射到具逐核亲和性的任务，实现多进程集中调度。
在 nOS-V 内为 SCHED_COOP 定义逐进程的 FIFO 调度策略，以根据亲和性和 NUMA 考虑选择下一个任务。
对标准 glibc API（pthread 创建、阻塞、亲和性）进行拦截，以通过 USF 路由而无需内核修改。
提供带阻塞感知的扩展（互斥锁、屏障、信号量、poll/epoll）以触发工作节点切换和任务重新提交。
描述与阻塞检测、忙等待屏障、以及进程之间共享内存安全性相关的局限性。

Figure 1 . Glibcv architecture diagram. Application’s standard API calls are forwarded to the USF backend if enabled, which bridges with the nOS-V API. nOS-V schedules threads according to the selected policy.

实验结果

研究问题

RQ1用户空间调度器是否能够在多运行时和多进程工作负载下协调以提升在超订阅条件下的性能？
RQ2与基线 Linux 调度相比，采用无缝 USF/SCHED_COOP 方法能实现何种性能提升？
RQ3SCHED_COOP 如何缓解如锁拥有者/等待者抢占和可扩展性崩溃等干扰？
RQ4在多样化的 HPC/AI 堆栈中部署 USF 的实际局限性与调优指南？
RQ5运行时嵌套如何影响性能，以及如何在尽量少改动应用的前提下实现？

主要发现

在超订阅的多进程场景中，使用 USF/SCHED_COOP 可实现高达 2.4x 的性能提升。
无缝的用户空间调度可在不进行应用侵入式修改或内核编辑的情况下提高吞吐量。
与基线 Linux 调度相比，在嵌套运行时和多进程工作负载中显示出改进。
nOS-V 实现了与集中调度器对跨进程的任务协调的无缝多进程协作。
基于阻塞驱动的任务切换和逐核亲和性有助于减少干扰并维持并行性。
手动适配在某些配置中可带来最高 4x 的加速，而无缝的 USF 方法在各场景中都能带来显著提升。

Figure 2 . Evaluated matmul software stacks. a) Baseline with yield. b) Manual nOS-V integration. c) Seamless nOS-V integration. d) Unmodified (no yield).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。