QUICK REVIEW

[論文レビュー] Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

Jiazhuo Li, Lei Cao|arXiv (Cornell University)|Mar 7, 2026

Autonomous Vehicle Technology and Safety被引用数 0

ひとこと要約

運転の物理を基盤としたRSSMに基づく運動学対応の潜在世界モデルを提案。多モーダル入力と幾何学認識の監 supervise を組み合わせ、データ効率の良い自動運転ポリシー学習を実現。

ABSTRACT

Data-efficient learning remains a central challenge in autonomous driving due to the high cost and safety risks of large-scale real-world interaction. Although world-model-based reinforcement learning enables policy optimization through latent imagination, existing approaches often lack explicit mechanisms to encode spatial and kinematic structure essential for driving tasks. In this work, we build upon the Recurrent State-Space Model (RSSM) and propose a kinematics-aware latent world model framework for autonomous driving. Vehicle kinematic information is incorporated into the observation encoder to ground latent transitions in physically meaningful motion dynamics, while geometry-aware supervision regularizes the RSSM latent state to capture task-relevant spatial structure beyond pixel reconstruction. The resulting structured latent dynamics improve long-horizon imagination fidelity and stabilize policy optimization. Experiments in a driving simulation benchmark demonstrate consistent gains over both model-free and pixel-based world-model baselines in terms of sample efficiency and driving performance. Ablation studies further verify that the proposed design enhances spatial representation quality within the latent space. These results suggest that integrating kinematic grounding into RSSM-based world models provides a scalable and physically grounded paradigm for autonomous driving policy learning.

研究の動機と目的

driving physics と幾何を尊重する世界モデルを活用してデータ効率の良い自動運転を動機付ける。
observation encoder に車両運動学を取り入れて、 physically meaningful な運動ダイナミクスで潜在遷移を地 grounded（地盤付）させる。
潜在空間でレーン幾何と周囲車両状態を捉えるための幾何学認識監 supervising を課す。
構造化された潜在ダイナミクスを通じて長期的な想像 fidelity を向上させ、ポリシー最適化を安定化させる。
シミュレーションにおいてモデルフリーおよびピクセルベースの baseline よりもデータ効率と運転性能の向上を示す。

提案手法

再発型状態空間モデル (RSSM) を運転タスクへ拡張する。
画像観測と5次元の車両物理ベクトルを統一潜在埋め込みに融合する。
encoder と RSSM ダイナミクスに運動学情報を組み込んで潜在遷移を地 grounding する。
車線幾何と周辺車両を監 supervis のヘッドを追加して潜在状態を正則化する。
潜在空間の想像軌跡を lambda-returns と継続確率を用いた imagined trajectories で actor-critic を訓練する。
進捗・速度・車線中央揃え・安全ペナルティを組み合わせた報酬設計で学習を導く。

実験結果

リサーチクエスチョン

RQ1運転ポリシーのデータ効率を、運動学 grounded な潜在ダイナミクスは改善できるか？
RQ2幾何学認識監 supervis は運転タスクにおける潜在空間の空間表現品質を向上させるか？
RQ3多モーダル入力（画像 + 車両物理）は世界モデルの訓練と想像 fidelity にどのように影響するか？
RQ4シミュレーションにおけるモデルフリー基準と画像のみの世界モデルと比べた相対的な利得は？

主な発見

運動学 grounding を取り入れたフレームワークは、Metadrive シミュレーションにおいてモデルフリーの PPO ベースよりも収束が速く、性能も高い。
アブレーションでは、画像入力にレーンと周囲車両の監 supervis を追加するだけで平均リターンと成功率が改善され、車両物理を含めるとさらに改善された。
完全な Img+Head+Phys モデルは、他の変種に対して平均リターンと成功率の双方で著しい改善を示す。
想像品質の解析では、全モデルは物理的に妥当な軌跡と適切な車線意味を維持する一方、画像のみモデルはそうでない。
補助的な監 supervis ヘッドと多モーダル入力は、空間表現と運転性能を共同で高める。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。