QUICK REVIEW

[論文レビュー] Data Center Cooling System Optimization Using Offline Reinforcement Learning

Xianyuan Zhan, Xiangyu Zhu|ArXiv.org|Jan 25, 2025

Heat Transfer and Optimization被引用数 3

ひとこと要約

論文はグラフニューラルネットワークと時間反転対称性制約を備えた物理知識を組み込んだオフライン強化学習フレームワーク（TTDM）を用いてデータセンターの冷却を最適化し、実稼働DCで安全性違反なしに14–21%のエネルギー節約を達成する。

ABSTRACT

The recent advances in information technology and artificial intelligence have fueled a rapid expansion of the data center (DC) industry worldwide, accompanied by an immense appetite for electricity to power the DCs. In a typical DC, around 30~40% of the energy is spent on the cooling system rather than on computer servers, posing a pressing need for developing new energy-saving optimization technologies for DC cooling systems. However, optimizing such real-world industrial systems faces numerous challenges, including but not limited to a lack of reliable simulation environments, limited historical data, and stringent safety and control robustness requirements. In this work, we present a novel physics-informed offline reinforcement learning (RL) framework for energy efficiency optimization of DC cooling systems. The proposed framework models the complex dynamical patterns and physical dependencies inside a server room using a purposely designed graph neural network architecture that is compliant with the fundamental time-reversal symmetry. Because of its well-behaved and generalizable state-action representations, the model enables sample-efficient and robust latent space offline policy learning using limited real-world operational data. Our framework has been successfully deployed and verified in a large-scale production DC for closed-loop control of its air-cooling units (ACUs). We conducted a total of 2000 hours of short and long-term experiments in the production DC environment. The results show that our method achieves 14~21% energy savings in the DC cooling system, without any violation of the safety or operational constraints. Our results have demonstrated the significant potential of offline RL in solving a broad range of data-limited, safety-critical real-world industrial control problems.

研究の動機と目的

データセンター冷却の高エネルギーコストに対処するため、データ効率が高く安全な最適化手法を開発する。
高忠実度シミュレータやオンライン探索なしに、オフラインの履歴データを活用してポリシーを学習する。
物理知識を時間反転対称性とグラフニューラルネットワークを通じて組み込み、一般化を向上させる。
実世界の本番データセンターと専用のテストベッドで展開・検証を行う。

提案手法

DC冷却制御をオフラインMDPとして安全性を意識した報酬でエネルギー使用と温度安全性をバランスさせて定式化する。
TTDMを開発する：T対称性を強制する熱力学モデルをグラフニューラルネットワークで構築し、空間的/制御依存性を捉える。
潜在ODEの前方・後方動力学を課して潜在空間でのT対称性を強制する。
TD3+BCスタイルの目的で潜在空間のQ関数とポリシーを訓練し、T対称性正則化損失を導入する。
再構成と物理一貫性損失を用いて頑健な潜在表現（Lrec,Lfwd,Lrvs,Lds,LT-sym）を学習する。
実稼働DC（2000時間）と22サーバのテストベッドで評価し、PIDや他のオフライン/RLベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1物理知識を組み込んだオフラインRLは、シミュレータなしで限られた履歴データからDC冷却の効果的で安全な制御ポリシーを学べるか。
RQ2T対称性の強制とGNNベースのダイナミクスモデルは、オフラインRLのデータ効率とOOD一般化をDC冷却で改善するか。
RQ3実稼働の本番DCへオフラインRLポリシーを展開した場合、達成可能なエネルギー節約と熱安全性能はどの程度か。
RQ4さまざまなサーバ負荷下でのACU数を増やすことはエネルギー効率にどのような影響を与えるか。
RQ5GNN構造のアブレーションとT対称性強制の影響は、予測誤差とポリシー性能にどう現れるか。

主な発見

DC冷却で標準的なPIDコントローラと比較して実稼働DCで14–21%のエネルギー節約を達成。
2000時間の展開中に熱的安全性を維持しCAT違反はなし。
長期実験でACLFの削減が継続的に現れ、負荷変動下で温度場の均一性が改善。
テストベッドとアブレーション結果は、GNN構造の導入とT対称性強制が多段予測誤差を低減し、ポリシー性能を向上させることを示す。
データが限定的な設定でも、エネルギー効率でベースライン手法（PID、MPC、オフポリシーRL、セーフオフラインRL）を上回り、安全性を維持している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。