QUICK REVIEW

[论文解读] Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning

Guanyu Gao, Jie Li|arXiv (Cornell University)|Jan 15, 2019

Building Energy and Comfort Optimization参考文献 38被引用 70

一句话总结

本文提出一个深度强化学习框架，使用深度确定性策略梯度（DDPG）进行连续HVAC控制，结合一个贝叶斯正则化神经网络来预测热舒适度，在基于TRNSYS的建筑仿真器中评估，以在保持居住者舒适度的同时减少能源使用。

ABSTRACT

Heating, Ventilation, and Air Conditioning (HVAC) is extremely energy-consuming, accounting for 40% of total building energy consumption. Therefore, it is crucial to design some energy-efficient building thermal control policies which can reduce the energy consumption of HVAC while maintaining the comfort of the occupants. However, implementing such a policy is challenging, because it involves various influencing factors in a building environment, which are usually hard to model and may be different from case to case. To address this challenge, we propose a deep reinforcement learning based framework for energy optimization and thermal comfort control in smart buildings. We formulate the building thermal control as a cost-minimization problem which jointly considers the energy consumption of HVAC and the thermal comfort of the occupants. To solve the problem, we first adopt a deep neural network based approach for predicting the occupants' thermal comfort, and then adopt Deep Deterministic Policy Gradients (DDPG) for learning the thermal control policy. To evaluate the performance, we implement a building thermal control simulation system and evaluate the performance under various settings. The experiment results show that our method can improve the thermal comfort prediction accuracy, and reduce the energy consumption of HVAC while improving the occupants' thermal comfort.

研究动机与目标

在智能建筑中在保持居住者热舒适度的同时减少HVAC能源消耗。
开发一个考虑多种影响因素的居住者热舒适预测模型。
利用带有连续动作的深度强化学习实现精确的HVAC设定点控制。
在不同条件下在建筑仿真环境中验证该方法。

提出的方法

开发一个贝叶斯正则化前馈神经网络，以从室内状态变量预测居住者的热舒适度。
将能源优化和热舒适度表述为带有能量使用和舒适惩罚的成本(奖励)函数的马尔可夫决策过程。
应用具有actor-critic架构的深度确定性策略梯度（DDPG），对温度和湿度进行连续设定点控制。
在基于TRNSYS的建筑仿真中训练DDPG代理，使用回放缓冲区和Ornstein-Uhlenbeck探索噪声。
使用一个奖励函数，对HVAC能源消耗和在可接受热舒适阈值以外的不适进行惩罚（M在[-D, D]）。

实验结果

研究问题

RQ1与基线相比，连续行动的DDPG控制策略是否能在保持居住者舒适度的同时降低HVAC能源消耗？
RQ2使用贝叶斯正则化神经网络能多准确地从室内环境变量预测热舒适度？
RQ3能量-舒适权衡参数对学习策略和整体性能有何影响？
RQ4将学习到的热舒适预测器作为反馈与基于模型的方法相比，是否能改善控制决策？

主要发现

所提出的方法将基于DNN的热舒适预测器与DDPG结合，实现联合能源优化和舒适控制。
神经网络预测器使用贝叶斯正则化来提高舒适度估计的泛化能力。
在基于TRNSYS的仿真中评估，表明该方法在保持或提升居住者舒适度的同时可降低HVAC能源消耗。
动作空间保持连续性，以实现对HVAC设定点的精确控制，避免了其他DRL方法的离散化限制。
一个可配置的权衡参数在奖励中平衡能源成本与舒适惩罚，使其可根据居住者需求进行定制。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。