QUICK REVIEW

[논문 리뷰] Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency

Akhilesh Raj, Swann Perarnau|arXiv (Cornell University)|2026. 01. 16.

Parallel Computing and Optimization Techniques인용 수 0

한 줄 요약

본 논문은 RAPL 액추에이터를 사용하여 CPU 전력을 조절하는 오프라인 강화학습 접근법을 제시하며, 다양한 벤치마크에서 성능 저하를 제한하면서 에너지 절감을 크게 달성한다.

ABSTRACT

Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporation of power actuation and sensing capabilities in CPU designs is indicative of this, enabling the deployment of system software that can actively monitor and adjust energy consumption and performance at runtime. While reinforcement learning (RL) would seem ideal for the design of such energy efficiency control systems, online training presents challenges ranging from the lack of proper models for setting up an adequate simulated environment, to perturbation (noise) and reliability issues, if training is deployed on a live system. In this paper we discuss the use of offline reinforcement learning as an alternative approach for the design of an autonomous CPU power controller, with the goal of improving the energy efficiency of parallel applications at runtime without unduly impacting their performance. Offline RL sidesteps the issues incurred by online RL training by leveraging a dataset of state transitions collected from arbitrary policies prior to training. Our methodology applies offline RL to a gray-box approach to energy efficiency, combining online application-agnostic performance data (e.g., heartbeats) and hardware performance counters to ensure that the scientific objectives are met with limited performance degradation. Evaluating our method on a variety of compute-bound and memory-bound benchmarks and controlling power on a live system through Intel's Running Average Power Limit, we demonstrate that such an offline-trained agent can substantially reduce energy consumption at a tolerable performance degradation cost.

연구 동기 및 목표

HPC 시스템에서 에너지 효율성을 지속 가능성 목표로 삼고 애플리케이션 또는 하드웨어 특화 튜닝 없이 런타임 전력 제어를 가능하게 한다.
사전 수집된 데이터에서 라이브 시스템 학습 없이 파워-제어 정책을 학습하는 오프라인 RL 프레임워크를 제안한다.
경량의 온라인 신호를 활용하여 애플리케이션 및 하드웨어 독립적인 제어기를 개발하고 성능 저하를 줄이면서 에너지 사용을 감소시킨다.
애플리케이션 동작을 포착하고 RAPL을 통한 전력 상한 지침으로 사용하기 위해 하트비트와 하드웨어 카운터를 활용한다.
다양한 벤치마크에서 에너지 절감과 허용 가능한 성능 저하를 보여 주는 접근 방식의 유효성을 검증한다.

제안 방법

에너지-지연 곱 ED^2P를 최소화하는 문제로 정의하여 에너지와 성능의 균형을 잡는다.
임의 정책으로 수집된 상태-행동-보상 트랜지션 데이터셋에서 오프라인 Conservative Q-Learning (CQL) 에이전트를 학습한다.
상태를 s(t)=[progress(t), power(t), IPC(t), STL(t), CMR(t)]로 표현하고, RAPL이 제어하는 discretized PCAP 값을 행동으로 사용한다.
보상은 reward(t+1)=progress^3(t+1)/(power(t+1)+1e-3)로 정의하여 낮은 전력에서 높은 진행을 유리하게 만든다.
하드웨어 카운터를 수집하는 PAPI와 하트비트 기반의 진행도 측정을 상태와 보상에 반영하도록 한다.
1 Hz 샘플링에서 RAPL을 통해 PCAP를 설정하기 위해 탐욕적 Q-값으로 행동을 선택하며 학습된 에이전트를 온라인으로 평가한다.

실험 결과

연구 질문

RQ1오프라인 RL이 라이브 시스템 탐색 없이도 사전 수집된 데이터를 사용하여 HPC 노드에 대한 효과적인 파워-제어 정책을 학습할 수 있는가?
RQ2오프라인 RL 제어기가 다양한 커널 및 하드웨어 설정에서 허용 가능한 성능 저하와 함께 에너지 소비를 줄일 수 있는가?
RQ3제안된 방법이 에너지 절감 및 성능 영향 측면에서 기존 파워 관리 방식 및 제조사 구동기와 어떻게 비교되는가?
RQ4애플리케이션 단계 및 계산 강도 변화에 대해 컴퓨트- 및 메모리-바운드 워크로드에서 접근 방식이 강건한가?

주요 결과

오프라인 RL 제어기가 평균 에너지 소비를 약 20% 감소시킨다.
평균 성능 저하는 7.4%, 최악의 경우 14%의 저하를 보인다.
본 방법은 에너지 감소 측면에서 최첨단 파워-관리 시스템과 필요 시점 주파수 구동기보다 우수한 성능을 유지하면서 성능을 보존한다.
정책은 사전에 수집된 데이터에서 학습되며 CQL을 사용하는 단일 Q-네트워크를 이용해 분포적 편이를 완화한다.
하트비트와 하드웨어 카운터를 통해 런타임 동안 애플리케이션에 구애받지 않는 반응형 성능 추적이 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.