QUICK REVIEW

[论文解读] Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

Yuanlong Li, Yonggang Wen|arXiv (Cornell University)|Sep 15, 2017

Reinforcement Learning in Robotics参考文献 28被引用 37

一句话总结

本文提出了一种基于演员-评论家和离策略DDPG算法的端到端深度强化学习（DRL）框架——冷却控制算法（CCA），通过直接从监控数据中学习来优化数据中心冷却。该方法在仿真中实现了11%的冷却能效节省，在基于真实数据轨迹的评估中最高达到15%的节能效果，采用去低估验证机制以确保性能估计的保守性和可靠性。

ABSTRACT

Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper, we propose utilizing the large amount of monitoring data in DC to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tap it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end cooling control algorithm (CCA) that is based on the actor-critic framework and an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm. In the proposed CCA, an evaluation network is trained to predict an energy cost counter penalized by the cooling status of the DC room, and a policy network is trained to predict optimized control settings when gave the current load and weather information. The proposed algorithm is evaluated on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed CCA can achieve about 11% cooling cost saving on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study, we propose a de-underestimation validation mechanism as we cannot directly test the algorithm on a real DC. Even though with DUE the results are conservative, we can still achieve about 15% cooling energy saving on the NSCC data trace if we set the inlet temperature threshold at 26.6 degree Celsius.

研究动机与目标

解决由于复杂且难以建模的热力学与机械动力学导致的数据中心冷却控制次优或不稳定的问题。
开发一种数据驱动的端到端控制策略，避免对详细系统模型的依赖。
在真实数据中心环境中，降低冷却能耗的同时维持温度约束。
在仿真和真实数据轨迹上验证所提方法，确保其实际可行性。
引入去低估（DUE）验证机制，防止在实际部署中出现过于乐观的能效节省估计。

提出的方法

CCA框架采用演员-评论家深度强化学习架构，直接从历史监控数据中学习控制策略。
采用深度确定性策略梯度（DDPG）算法的离策略离线版本，以提高样本效率和训练稳定性。
评估网络（评论家）预测能耗成本，并对不符合要求的冷却状态施加惩罚；策略网络（演员）则根据当前负载和天气输入输出优化的控制动作。
奖励函数旨在最小化能耗成本，同时强制执行温度约束，其惩罚项由超参数λ控制。
提出一种去低估（DUE）验证方法，将标准平方误差替换为仅对温度预测低估情况进行惩罚的损失函数，以避免结果过于乐观。
该方法在EnergyPlus仿真平台和国家超级计算中心（NSCC）的真实数据轨迹上进行评估，使用包括机架功率、流量速率和供气温度在内的状态特征。

实验结果

研究问题

RQ1端到端深度强化学习方法是否能在数据中心冷却中优于传统的两阶段模型优化方法？
RQ2当无法直接部署时，基于DRL的策略在真实数据上的泛化能力如何？
RQ3惩罚超参数λ的选择对能效节省与温度合规性之间权衡的影响如何？
RQ4去低估（DUE）验证方法是否能提供比标准验证更可靠、更保守的性能估计？
RQ5所学习的策略在维持安全机架进气温度的同时，能在多大程度上降低冷却能耗？

主要发现

在EnergyPlus仿真平台上，CCA算法相较于手动配置的基线实现了约11%的冷却能耗节省。
在NSCC真实数据轨迹上，当进气温度阈值设定为26.6°C时，该算法实现了高达15%的冷却能耗节省，采用DUE验证方法。
DUE验证方法有效降低了温度预测中的低估偏差，从而得出了更保守且可信的性能估计。
随着λ增大，能耗节省减少，但最大机架温度降低，表明在效率与热安全之间存在可调节的权衡。
DRL模型成功捕捉了系统动态特性，在噪声较大的真实数据下，温度预测的平均绝对误差（MAE）低于0.1°C。
通过基于轨迹的测试验证，策略网络在不同负载和天气条件下均能稳健预测最优气流速率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。