[論文レビュー] Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation
The paper presents a model-free DRL framework for three-phase voltage source inverters, using policy distillation to compress a high-capacity teacher policy into a lightweight student for microsecond-scale real-time control, with Lyapunov-based and auxiliary rewards to ensure stability and safety validated on hardware.
In response to the trade-off between control performance and computational burden hindering the deployment of Deep Reinforcement Learning (DRL) in power inverters, this paper presents a novel model-free control framework leveraging policy distillation. To handle the convergence instability and steady-state errors inherent in model-free agents, an error energy-guided hybrid reward mechanism is established to theoretically constrain the exploration space. More specifically, an adaptive importance weighting mechanism is integrated into the distillation architecture to amplify the significance of fluctuation regions, ensuring high-quality transfer of transient control logic by mitigating the observational bias dominated by steady-state data. This approach efficiently compresses the heavy DRL policy into a lightweight neural network, retaining the desired control performance while overcoming the computational bottleneck during deployment. The proposed method is validated through a hardware-based kilowatt-level experimental platform. Experimental comparison results with traditional methods demonstrate that the proposed technique reduces inference time to the microsecond level and achieves superior transient response speed and parameter robustness.
研究の動機と目的
- Address the performance-robustness trade-off and high computational burden of DRL for power inverter control.
- Develop a model-free DRL framework based on SAC to handle nonlinear, strongly coupled VSI dynamics.
- Introduce policy distillation with adaptive importance weighting to achieve lightweight real-time deployment without sacrificing performance.
- Incorporate Lyapunov-based stability and safety constraints in the reward design to ensure safe exploration and asymptotic stability.
- Validate the approach through hardware-level experiments and compare with traditional PI and FCS-MPC methods.
提案手法
- Formulate VSI voltage control as a continuous-action MDP using SAC with a discrete Lyapunov candidate function to constrain exploration.
- Define state as [e_ud, e_uq, u_bus,d, u_bus,q, i_Ld, i_Lq] and action as [u_inv,d, u_inv,q], with normalization.
- Design a detailed reward comprising stability (ΔV penalty) and auxiliary terms for tracking (r2), current safety (r3), and THD limits (r4).
- Develop a teacher-student distillation framework where a large teacher network transfers knowledge to a lightweight student network under adaptive importance weighting (to emphasize transient regions).
- Construct a trajectory-based expert dataset from the teacher for supervised training of the student, ensuring train/test trajectory separation.
- Assess hardware latency (teacher ~33 μs, distilled S2 ~1.1 μs) and quantify dynamic performance (SSE, THD, overshoot) under various loading scenarios.
- Provide a Lyapunov-consistent distillation loss J_phy combining action matching with stability regularization and an adaptive weight W(s) to mitigate observational bias.
実験結果
リサーチクエスチョン
- RQ1Can a model-free DRL controller achieve high transient performance and robustness for a VSI under load disturbances?
- RQ2Does policy distillation enable real-time, microsecond-scale control on resource-constrained hardware without sacrificing teacher-level performance?
- RQ3How do Lyapunov-based stability constraints and adaptive weighting affect exploration, safety, and transfer learning in the distillation process?
- RQ4What are the comparative benefits of the proposed DRL framework against PI and FCS-MPC in terms of SSE, THD, and overshoot under various disturbances and parameter drift?
主な発見
- The proposed DRL controller achieves SSE around 0.05 V, THD about 1.15–1.33%, and overshoot below 1.0–1.33% across cases, outperforming PI and FCS-MPC on several metrics.
- Policy distillation compresses the teacher from 13,442 parameters to compact S1/S2 models with compression ratios of 5 and 26.7, and inference times of 33 μs (teacher) and 1.1 μs (S2) per control cycle.
- The lightweight student (S2) achieves microsecond-level latency (1.1 μs) within a 10 kHz control cycle and maintains strong transient performance.
- Across Case 1–3 scenarios (severe load step, complex load switching, and parameter uncertainty), the proposed DRL retains robustness and reduces overshoot relative to PI and FCS-MPC; DRL with distillation and Lyapunov constraints shows favorable trade-offs between accuracy and safety.
- The experiments validate hardware feasibility on a kilowatt-level platform, indicating the method’s practicality for real-time inverter control.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。