QUICK REVIEW

[论文解读] Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour

Meng Yuan, Ye Emma Wang|arXiv (Cornell University)|Feb 4, 2026

Electric Vehicles and Infrastructure被引用 0

一句话总结

本文介绍了一种使用拉格朗日软行动者- critic 的受约束深度强化学习框架，在具有异质静态电池和电动汽车电池且 EV 使用具有随机性的情况下，优化家庭能源管理，提升成本和电池退化指标，同时保持舒适度。

ABSTRACT

The widespread adoption of photovoltaic (PV), electric vehicles (EVs), and stationary energy storage systems (ESS) in households increases system complexity while simultaneously offering new opportunities for energy regulation. However, effectively coordinating these resources under uncertainties remains challenging. This paper proposes a novel home energy management framework based on deep reinforcement learning (DRL) that can jointly minimise energy expenditure and battery degradation while guaranteeing occupant comfort and EV charging requirements. Distinct from existing studies, we explicitly account for the heterogeneous degradation characteristics of stationary and EV batteries in the optimisation, alongside stochastic user behaviour regarding arrival time, departure time, and driving distance. The energy scheduling problem is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian soft actor-critic (SAC) algorithm. This approach enables the agent to learn optimal control policies that enforce physical constraints, including indoor temperature bounds and target EV state of charge upon departure, despite stochastic uncertainties. Numerical simulations over a one-year horizon demonstrate the effectiveness of the proposed framework in satisfying physical constraints while eliminating thermal oscillations and achieving significant economic benefits. Specifically, the method reduces the cumulative operating cost substantially compared to two standard rule-based baselines while simultaneously decreasing battery degradation costs by 8.44%.

研究动机与目标

最小化净电网电力成本与电池退化。
在边界内确保居住者热舒适。
在协调蓄电、光伏与暖通空调的同时满足 EV 充电需求。
建模异质电池退化与随机 EV 行为。
提供在瑞典家庭场景中验证的学习框架。

提出的方法

将调度问题表述为受约束的马尔可夫决策过程（CMDP）。
使用带有对约束处理的对偶变量的拉格朗日软行动者- critic（SAC）算法求解。
结合半经验退化模型区分 LFP（静态）和 NMC（EV）电池。
利用拟合自瑞典旅行调查数据的到达/离开时间与日驾驶里程的分布，对随机 EV 行为进行建模。
使用高保真瑞典家庭环境来验证学习性能和约束满足情况。

实验结果

研究问题

RQ1在随机 EV 使用情境下，带有拉格朗日 SAC 的 CMDP 能否有效地对 HVAC、ESS、EV 和家庭电器进行协同最优化？
RQ2异质电池退化模型如何影响 HEMS 控制策略与成本节省？
RQ3在不确定性下，该框架是否能维持居住者舒适和 EV 充电需求？
RQ4与基于规则的基线相比，经济收益与退化减少幅度如何？

主要发现

该方法在与两个标准基线规则相比时，累计运营成本显著降低。
电池退化成本下降 8.44%。
该框架在存在随机不确定性时，强制室内温度界限并在离家时达到目标 EV 电量状态。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。