QUICK REVIEW

[论文解读] Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency

Akhilesh Raj, Swann Perarnau|arXiv (Cornell University)|Jan 16, 2026

Parallel Computing and Optimization Techniques被引用 0

一句话总结

该论文提出一种离线强化学习方法，使用 RAPL 执行器调节 CPU 功率，在多样化基准测试中实现显著的能耗节省且对性能的影响有限。

ABSTRACT

Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporation of power actuation and sensing capabilities in CPU designs is indicative of this, enabling the deployment of system software that can actively monitor and adjust energy consumption and performance at runtime. While reinforcement learning (RL) would seem ideal for the design of such energy efficiency control systems, online training presents challenges ranging from the lack of proper models for setting up an adequate simulated environment, to perturbation (noise) and reliability issues, if training is deployed on a live system. In this paper we discuss the use of offline reinforcement learning as an alternative approach for the design of an autonomous CPU power controller, with the goal of improving the energy efficiency of parallel applications at runtime without unduly impacting their performance. Offline RL sidesteps the issues incurred by online RL training by leveraging a dataset of state transitions collected from arbitrary policies prior to training. Our methodology applies offline RL to a gray-box approach to energy efficiency, combining online application-agnostic performance data (e.g., heartbeats) and hardware performance counters to ensure that the scientific objectives are met with limited performance degradation. Evaluating our method on a variety of compute-bound and memory-bound benchmarks and controlling power on a live system through Intel's Running Average Power Limit, we demonstrate that such an offline-trained agent can substantially reduce energy consumption at a tolerable performance degradation cost.

研究动机与目标

将能效作为 HPC 系统可持续性目标进行驱动，并在运行时实现对功率的控制，不依赖于应用或硬件的特定调优。
提出一种离线 RL 框架，从事先采集的数据中学习功率控制策略，无需实时系统训练。
开发一个应用无关、硬件无关的控制器，利用轻量级在线信号在降低能耗的同时保持性能。
利用心跳信号和硬件计数器捕捉应用行为，并通过 RAPL 指导功率上限。
在多种基准测试上验证该方法，在能量节省的同时实现可容忍的性能下降。

提出的方法

将问题形式化为最小化能量-延迟乘积平方 (ED^2P)，在能量与性能之间取得平衡。
在由任意策略收集的状态-动作-奖励转移数据集上训练离线保守性 Q 学习（CQL）代理。
将状态表示为 s(t)=[progress(t), power(t), IPC(t), STL(t), CMR(t)]，将动作表示为由 RAPL 控制的离散化 PCAP 值。
将奖励定义为 reward(t+1)=progress^3(t+1)/(power(t+1)+1e-3)，以偏好在较低功耗下实现更高进展。
使用 PAPI 收集硬件计数器和基于心跳的进度度量来告知状态和奖励。
在在线评估阶段通过贪心 Q 值来选择动作，以在 1 Hz 采样下通过 RAPL 设置 PCAP。

实验结果

研究问题

RQ1离线 RL 能否利用预先收集的数据在没有实时系统探索的情况下，为 HPC 节点学习有效的功率控制策略？
RQ2离线 RL 控制器在多样化内核和硬件设置下是否能在可接受的性能下降下实现能耗降低？
RQ3所提出的方法在能量节省和性能影响方面与现有的功率管理方法及厂家 Governor 相比如何？
RQ4该方法对应用阶段以及在计算密集型和内存带宽限制工作负载中的算术强度变化是否具鲁棒性？

主要发现

离线 RL 控制器平均将能耗降低约 20%。
平均性能下降为 7.4%，最坏情况下降为 14%。
该方法在能量减少方面优于最先进的功率管理系统和按需频率调节器，同时保持性能。
策略由预先收集的数据学习，并使用单一 Q 网络结合 CQL 来缓解分布漂移。
心跳信号和硬件计数器使在运行时实现应用无关但对性能敏感的跟踪成为可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。