QUICK REVIEW

[論文レビュー] Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency

Akhilesh Raj, Swann Perarnau|arXiv (Cornell University)|Jan 16, 2026

Parallel Computing and Optimization Techniques被引用数 0

ひとこと要約

要約: 本論文は RAPL アクチュエータを用いたオフライン強化学習による CPU 電力制御を提案し、 diverse benchmarks で性能低下を抑えつつ大幅なエネルギー節約を実現する。

ABSTRACT

Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporation of power actuation and sensing capabilities in CPU designs is indicative of this, enabling the deployment of system software that can actively monitor and adjust energy consumption and performance at runtime. While reinforcement learning (RL) would seem ideal for the design of such energy efficiency control systems, online training presents challenges ranging from the lack of proper models for setting up an adequate simulated environment, to perturbation (noise) and reliability issues, if training is deployed on a live system. In this paper we discuss the use of offline reinforcement learning as an alternative approach for the design of an autonomous CPU power controller, with the goal of improving the energy efficiency of parallel applications at runtime without unduly impacting their performance. Offline RL sidesteps the issues incurred by online RL training by leveraging a dataset of state transitions collected from arbitrary policies prior to training. Our methodology applies offline RL to a gray-box approach to energy efficiency, combining online application-agnostic performance data (e.g., heartbeats) and hardware performance counters to ensure that the scientific objectives are met with limited performance degradation. Evaluating our method on a variety of compute-bound and memory-bound benchmarks and controlling power on a live system through Intel's Running Average Power Limit, we demonstrate that such an offline-trained agent can substantially reduce energy consumption at a tolerable performance degradation cost.

研究の動機と目的

HPC システムにおけるエネルギー効率を持続可能性の目標として動機づけ、アプリケーションやハードウェア依存のチューニングなしにランタイム電力制御を可能にする。
事前収集データからの学習でライブシステムのトレーニングを必要としないオフライン RL フレームワークを提案する。
アプリケーションおよびハードウェアに依存しないコントローラを開発し、オンライン信号を軽量に使用して性能を維持しつつエネルギー使用を削減する。
心拍信号とハードウェアカウンタを利用してアプリケーション挙動を捉え、RAPL を介する電力抑制を導く。
エネルギー節約と許容できる性能低下を示すベンチマークの範囲でアプローチを検証する。

提案手法

エネルギー遅延積 ED^2P を最小化する問題設定として表現し、エネルギーと性能のバランスを取る。
任意のポリシーで収集された状態-行動-報酬遷移データセット上でオフラインの Conservative Q-Learning (CQL) エージェントを訓練する。
状態を s(t)=[progress(t), power(t), IPC(t), STL(t), CMR(t)]、行動を RAPL によって制御される離散化された PCAP 値として表現する。
報酬を reward(t+1)=progress^3(t+1)/(power(t+1)+1e-3) と定義し、低電力で高い進捗を優先する。
ハードウェアカウンタを収集するために PAPI を使用し、心拍ベースの進捗指標を用いて状態と報酬を通知する。
訓練済みエージェントをオンラインで評価し、 greedy Q-value によって PCAP を設定し 1 Hz のサンプリングで RAPL を介して実行する。

実験結果

リサーチクエスチョン

RQ1オフライン RL はライブシステムの探索なしに事前収集データを用いて HPC ノードの効果的な電力制御ポリシーを学習できるか。
RQ2オフライン RL コントローラは diverse kernels およびハードウェア設定の下で許容可能な性能低下とともにエネルギー消費を削減するか。
RQ3提案手法はエネルギー節約と性能影響の観点で既存の電力管理手法およびメーカーの governor と比較してどうか。
RQ4計算とメモリバウンドの作業負荷におけるアプリケーションフェーズや異なる算術強度に対して手法は頑健か。

主な発見

オフライン RL コントローラは平均エネルギー消費を約 20%削減する。
平均性能低下は 7.4%、最悪ケースで 14% の低下。
本手法はエネルギー削減において最先端の電力管理システムおよびオンデマンド周波数 governor を上回りつつ性能を維持する。
ポリシーは事前収集データから学習され、CQL を用いた単一の Q-network で分布シフトを緩和する。
心拍とハードウェアカウンタにより、ランタイム中のアプリケーション非依存かつ適切なパフォーマンス追跡を実現する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。