QUICK REVIEW

[논문 리뷰] Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

Jiazhuo Li, Lei Cao|arXiv (Cornell University)|2026. 03. 07.

Autonomous Vehicle Technology and Safety인용 수 0

한 줄 요약

tldr: kinematics-grounded latent world model을 RSSM 기반으로 도입하여 차량 물리학과 다중 모달 입력 및 geometry-aware supervision을 결합해 자율 주행의 데이터 효율적 정책 학습을 향상시킵니다.

ABSTRACT

Data-efficient learning remains a central challenge in autonomous driving due to the high cost and safety risks of large-scale real-world interaction. Although world-model-based reinforcement learning enables policy optimization through latent imagination, existing approaches often lack explicit mechanisms to encode spatial and kinematic structure essential for driving tasks. In this work, we build upon the Recurrent State-Space Model (RSSM) and propose a kinematics-aware latent world model framework for autonomous driving. Vehicle kinematic information is incorporated into the observation encoder to ground latent transitions in physically meaningful motion dynamics, while geometry-aware supervision regularizes the RSSM latent state to capture task-relevant spatial structure beyond pixel reconstruction. The resulting structured latent dynamics improve long-horizon imagination fidelity and stabilize policy optimization. Experiments in a driving simulation benchmark demonstrate consistent gains over both model-free and pixel-based world-model baselines in terms of sample efficiency and driving performance. Ablation studies further verify that the proposed design enhances spatial representation quality within the latent space. These results suggest that integrating kinematic grounding into RSSM-based world models provides a scalable and physically grounded paradigm for autonomous driving policy learning.

연구 동기 및 목표

데 driving physics와 기하학을 존중하는 월드 모델을 활용해 데이터 효율적인 자율 주행을 동기 부여한다.
관측 인코더에 차량 운동학을 포함시켜 physically meaningful motion dynamics에서 잠재 전이를Ground한다.
잠재 공간에서 차선 기하학 및 주변 차량 상태를 포착하기 위해 geometry-aware 감독을 강화한다.
구조화된 잠재 역학을 통해 장기 환상(imagination) 정밀도와 정책 최적화의 안정성을 개선한다.
시뮬레이션에서 모델 프리 및 픽셀 기반 기준선 대비 데이터 효율성과 주행 성능 향상을 시연한다.

제안 방법

Recurrent State-Space Model(RSSM)을 주행 태스크에 확장한다.
이미지 관측과 5D 차량 물리 벡터를 하나의 잠재 임베딩으로 융합한다.
encoder와 RSSM dynamics에 운동학 정보를 도입하여 잠재 전이를 Ground한다.
차선 기하학 및 이웃 차량에 대한 운전 특화 감독 헤드를 추가해 잠재 상태를 정규화한다.
잠재 공간에서 상상되는 경로를 이용해 actor-critic을 lambda-returns 및 연속 확률과 함께 학습한다.
진척, 속도, 차선 중앙정렬, 안전 페널티를 결합한 보상 설계를 사용하여 학습을 유도한다.

실험 결과

연구 질문

RQ1운전 정책의 데이터 효율성을 높이기 위해 운동학 기반 잠재 역학이 도움이 될 수 있는가?
RQ2geometry-aware 감독이 운전 작업에서 잠재 공간의 공간 표현 품질을 향상시키는가?
RQ3다중 모달 입력(image + 차량 물리)이 월드 모델 학습과 상상 정밀도에 어떤 영향을 주는가?
RQ4시뮬레이션에서 모델 프리 및 이미지 전용 월드 모델에 비해 비교적 어떤 이점을 얻을 수 있는가?

주요 결과

운동학 기반 프레임워크가 Metadrive 시뮬레이션에서 모델 프리 PPO 기반 대비 더 빠른 수렴과 더 높은 성능을 달성한다.
ablations에서 이미지 입력에 차선 및 이웃 감독을 추가하면 평균 보상 및 성공률이 개선되며 차량 물리 정보를 포함하면 추가 이점이 있다.
전체 Img+Head+Phys 모델은 변형들보다 평균 보상과 성공률 모두에서 상당한 개선을 보인다.
상상 품질 분석에서 전체 모델은 물리적으로 타당한 궤적과 적절한 차선 의미를 유지하는 반면 이미지 전용 모델은 그렇지 않다.
추가 감독 헤드와 다중 모달 입력이 공간 표현과 주행 성능을 함께 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.