QUICK REVIEW

[論文レビュー] Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

Aymen Khouja, Imen Jendoubi|arXiv (Cornell University)|Feb 22, 2026

Smart Grid Energy Management被引用数 0

ひとこと要約

要約: 本論文は CityLearn における six MARL アルゴリズムを DTDE/CTDE および feedforward/recurring アーキテクチャでベンチマークし、新 KPIs（バッテリの DoD、Agent Importance）を導入、DTDE がロバスト性で優れることが多く、時系列モデルは ramping および蓄電効率を改善することを示す。

ABSTRACT

The optimization of urban energy systems is crucial for the advancement of sustainable and resilient smart cities, which are becoming increasingly complex with multiple decision-making units. To address scalability and coordination concerns, Multi-Agent Reinforcement Learning (MARL) is a promising solution. This paper addresses the imperative need for comprehensive and reliable benchmarking of MARL algorithms on energy management tasks. CityLearn is used as a case study environment because it realistically simulates urban energy systems, incorporates multiple storage systems, and utilizes renewable energy sources. By doing so, our work sets a new standard for evaluation, conducting a comparative study across multiple key performance indicators (KPIs). This approach illuminates the key strengths and weaknesses of various algorithms, moving beyond traditional KPI averaging which often masks critical insights. Our experiments utilize widely accepted baselines such as Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC), and encompass diverse training schemes including Decentralized Training with Decentralized Execution (DTDE) and Centralized Training with Decentralized Execution (CTDE) approaches and different neural network architectures. Our work also proposes novel KPIs that tackle real world implementation challenges such as individual building contribution and battery storage lifetime. Our findings show that DTDE consistently outperforms CTDE in both average and worst-case performance. Additionally, temporal dependency learning improved control on memory dependent KPIs such as ramping and battery usage, contributing to more sustainable battery operation. Results also reveal robustness to agent or resource removal, highlighting both the resilience and decentralizability of the learned policies.

研究の動機と目的

都市エネルギー管理の MARL の厳密なベンチマークを推進し、スケーラビリティ、協調性、ロバスト性を評価する。
CityLearn 環境における DTDE および CTDE 訓練パラダイム下で六つの MARL アルゴリズムを評価する。
再帰エンコーダによる時系列依存性が主要エネルギー制御 KPI に与える影響を評価する。
現実的な展開上の懸念を反映する新規 KPI（例: バッテリの深放電 DoD、Agent Importance）を導入・検証する。

提案手法

スケーラブルなマルチエージェント訓練・実行のために Sebulba アーキテクチャを用いて六つの MARL アルゴリズムを実装する。
分散型（DTDE）と中央集権型訓練・分散実行（CTDE）を、フィードフォワードおよび再帰的観測エンコーダとともに評価する。
PPO（オンポリシー）と SAC（オフポリシー）を代表的な学習アルゴリズムとして用い、GRU ベースの観測エンコーダを備えた再帰的変種を含める。
同一の rollout 条件と共有報酬信号の下で訓練し、エージェント間の協調を促進する。
CityLearn 2023 データセットで、IQM、CVaR、ブートストラップ信頼区間、ロバストネスメトリクスを含む包括的な評価プロトコルでベンチマークする。

実験結果

リサーチクエスチョン

RQ1DTDE と CTDE の訓練パラダイムは、平均性能・ロバスト性・最悪ケースの結果について、マルチエージェントのエネルギー管理でどのように比較されるか。
RQ2時系列依存性（再帰エンコーダ）が ramping、バッテリ使用量、不快感などの主要KPIに与える影響は何か。
RQ3独立（分散型）学習者は、 centralized-coordination アプローチと比較して安定性・耐性が高いか。
RQ4バッテリ DoD や Agent Importance のような新規 KPI は、CityLearn の設定でクレジット割り当てと長期的な持続可能性をどのように informing するか。

主な発見

DTDE は評価対象のシナリオ全体において平均および最悪ケースの性能で CTDE を一貫して上回る。
時系列依存性の学習は ramping やバッテリ使用量などの記憶依存 KPI の制御を改善する。
再帰的アーキテクチャは ramping の性能を大きく向上させる一方、炭素排出量や不快感への影響は混在する。
独立学習者（例: IPPO、SAC）は、MAPPO のような中央集権的変種よりも性能分散が小さく、より堅牢な挙動を示す。
Agent Importance のスコアはほぼゼロで、エージェント間の貢献が均等であり、個々のエージェントや資源を削除してもロバストであることを示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。