QUICK REVIEW

[논문 리뷰] Evolving LLM-Derived Control Policies for Residential EV Charging and Vehicle-to-Grid Energy Optimization

Vishesh Purnananda, Benjamin John Wruck|arXiv (Cornell University)|2026. 02. 06.

Electric Vehicles and Infrastructure인용 수 0

한 줄 요약

본 논문은 거주용 EV 충전/V2G를 위한 실행 가능한 Python 정책을 LLM이 작성하는 여섯 단계의 진화적 파이프라인을 제시하며, EV2Gym-Residential에서 평가하여 이익, 편안함, 안전 사이의 균형을 투명하고 감사 가능한 코드로 달성한다.

ABSTRACT

This research presents a novel application of Evolutionary Computation to the domain of residential electric vehicle (EV) energy management. While reinforcement learning (RL) achieves high performance in vehicle-to-grid (V2G) optimization, it typically produces opaque "black-box" neural networks that are difficult for consumers and regulators to audit. Addressing this interpretability gap, we propose a program search framework that leverages Large Language Models (LLMs) as intelligent mutation operators within an iterative prompt-evaluation-repair loop. Utilizing the high-fidelity EV2Gym simulation environment as a fitness function, the system undergoes successive refinement cycles to synthesize executable Python policies that balance profit maximization, user comfort, and physical safety constraints. We benchmark four prompting strategies: Imitation, Reasoning, Hybrid and Runtime, evaluating their ability to discover adaptive control logic. Results demonstrate that the Hybrid strategy produces concise, human-readable heuristics that achieve 118% of the baseline profit, effectively discovering complex behaviors like anticipatory arbitrage and hysteresis without explicit programming. This work establishes LLM-driven Evolutionary Computation as a practical approach for generating EV charging control policies that are transparent, inspectable, and suitable for real residential deployment.

연구 동기 및 목표

RL 기반 V2G 제어의 해석 가능성 격차를 해소하기 위해 명시적이고 감사 가능한 제어 정책을 생산한다.
현실과 유사한 시뮬레이터에서 평가되는 Python 의사결정 함수를 생성하는 LLM이 포함된 여섯 단계 파이프라인을 개발한다.
Reasoning, Imitation, Hybrid, Runtime 등 프롬프팅 전략을 기준 휴리스틱에 대조하여 벤치마크한다.
LLM에서 도출된 정책이 해석 가능한 코드로 경쟁력 있는 이익을 달성할 수 있음을 보여준다.
주거용 에너지 시스템에서 코드-정책(code-as-policies)의 규제 및 실용적 배치 고려사항을 평가한다.

제안 방법

Ledger에서 얻은 상태-행동 예시 및 24시간 가격 예측의 간결한 입력 데이터셋을 생성한다.
가드레일이 있는 LLM에 지시하여 Python 함수 decide_power(...)를 생성하게 한다.
생성된 정책을 EV2Gym-Residential에서 다일(day 단위) 롤아웃으로 실행한다.
반복 보정 루프를 유도하기 위한 정량적 보상 및 반례를 수집한다.
이익, 기준선에 대한 충실도, 안전 제약 측면에서 정책을 비교한다.
효과성을 위해 네 가지 프롬프팅 전략(Imitation, Reasoning, Hybrid, Runtime)을 분석한다.

실험 결과

연구 질문

RQ1대형 언어 모델은 물리적 및 사용자 제약을 존중하면서 동적 조건에 적응하는 명시적이고 해석 가능한 주거용 V2G 제어 정책을 생성하고 반복적으로 개선할 수 있는가?
RQ2LLM 기반 진화적 정책 합성 방식이 고충실도 V2G 시뮬레이터에서 기준 휴리스틱 성능에 부합하거나 이를 상회하는 투명하고 감사 가능한 솔루션을 제공하는가?

주요 결과

전략	기준선 보상	발생 보상	기준선 대비 상대	API 비용
Pure Reasoning	8.865	6.210	70.1%	Low
Pure Imitation	8.865	6.790	76.6%	Low
Hybrid Iterative	2.660	3.150	118.0%	Moderate
Runtime LLM	8.865	16.843	190.0%	High

Hybrid 전략은 간결하고 사람도 읽기 쉬운 휴리스틱으로 기준 이익의 118%를 달성했다.
Hybrid Iterative 정책은 24h 가격 예측을 사용한 선제적 차익거래를 발견하여 기준선보다 수익을 18% 향상시켰다.
Pure Reasoning은 보정에 어려움을 겪었고 차익거래 기회를 놓쳤다(29.9% 적자).
Pure Imitation은 기준선 행동과 일치했으나 혁신성이 제한적이었습니다(잠재 보상의 76.6%).
Runtime LLM 정책은 상대 성능이 가장 높았고(기준선의 190%), 다만 API 비용이 더 높았다.
진화된 정책은 해석 가능성을 유지했고 진화된 코드는 종종 간결했다(예: 15줄).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.