QUICK REVIEW

[논문 리뷰] Driving Policy Transfer via Modularity and Abstraction

Matthias Müller, Alexey Dosovitskiy|arXiv (Cornell University)|2018. 04. 25.

Autonomous Vehicle Technology and Safety인용 수 122

한 줄 요약

이 논문은 모듈식 아키텍처(지각, 정책, 로우-레벨 제어)가 시뮬레이션에서 학습된 운전 정책을 미세 조정 없이 실제 1/5 스케일 트럭에 직접 전이할 수 있음을 보여주며, 원시 이미지나 모터 명령 대신 시맨틱 맵과 웨이포인트를 활용합니다.

ABSTRACT

End-to-end approaches to autonomous driving have high sample complexity and are difficult to scale to realistic urban driving. Simulation can help end-to-end driving systems by providing a cheap, safe, and diverse training environment. Yet training driving policies in simulation brings up the problem of transferring such policies to the real world. We present an approach to transferring driving policies from simulation to reality via modularity and abstraction. Our approach is inspired by classic driving systems and aims to combine the benefits of modular architectures and end-to-end deep learning approaches. The key idea is to encapsulate the driving policy such that it is not directly exposed to raw perceptual input or low-level vehicle dynamics. We evaluate the presented approach in simulated urban environments and in the real world. In particular, we transfer a driving policy trained in simulation to a 1/5-scale robotic truck that is deployed in a variety of conditions, with no finetuning, on two continents. The supplementary video can be viewed at https://youtu.be/BrMDJqI6H5U

연구 동기 및 목표

모듈성과 추상화를 활용하여 자율주행의 현실 격차를 동기 부여하고 해결합니다.
지각, 정책, 제어를 분리하는 3단계 아키텍처를 제안하여 시뮬레이션에서 현실로의 전이를 용이하게 합니다.
현실 세계의 세분화 데이터로 지각을 학습하고 현실적인 지각 출력물을 사용하여 시뮬레이션에서 운전 정책을 완전히 학습합니다.
다양한 환경과 조건에서 시뮬레이션에서 물리적 차량으로의 전이를 시연합니다.
시맨틱 표현과 웨이포인트 출력이 도메인 간 강력한 전이에 어떻게 기여하는지 조사합니다.]
method:[
Three-module architecture: perception (encoder-decoder producing per-pixel road/non-road segmentation), driving policy (maps segmentation to local waypoint plan), and low-level controller (PID-based to follow waypoints).
Perception is trained on Cityscapes for binary road segmentation using ERFNet and cross-entropy loss.
Driving policy is trained in CARLA with conditional imitation learning (CIL) to output two waypoints encoded by distance and relative angle, conditioned on high-level commands (left/straight/right).
Policy is trained on segmentation outputs that include realistic noise (no ground-truth segmentation) to mimic real perception imperfections.
Training uses simulation data (28 hours) with an expert planner and a PID follower; data augmentation and weather variability are applied.
Control uses separate PID controllers for throttle and steering based on waypoint angles (φ1) and target speeds.]
research_questions:[
Can a modular perception-policy-control architecture enable direct sim-to-real transfer of driving policies without fine-tuning?
Does abstracting perception to semantic segmentation and driving to waypoint outputs improve generalization across environments and weather conditions?
How does training with noisy segmentation outputs affect real-world transfer performance?
What is the comparative performance of modular, waypoint-based policies versus end-to-end, image-based policies under domain shift?]
key_findings:[
The modular approach outperforms monolithic end-to-end baselines in simulation under unseen towns and weather conditions.
In simulation, waypoint-based predictions from segmentation generalize better to new towns and weather than image-to-control or image-to-waypoint baselines.
In the real world, the modular policy achieves 82% success without data augmentation and 100% with augmentation across three routes on a 1/5-scale truck.
End-to-end image-based policies trained on color images fail to generalize well to the real world even with augmentation or domain randomization.
The real-robot experiments demonstrate transfer from simulation to reality without finetuning, with only Cityscapes data used for perception training.
The physical vehicle completed all three long routes with only a few infractions, including one severe infraction requiring intervention.]
table_headers:[
Route
Length
Time
Missed turns
Severe
Mild

제안 방법

Three-module architecture: perception (encoder-decoder producing per-pixel road/non-road segmentation), driving policy (maps segmentation to local waypoint plan), and low-level controller (PID-based to follow waypoints).
Perception is trained on Cityscapes for binary road segmentation using ERFNet and cross-entropy loss.
Driving policy is trained in CARLA with conditional imitation learning (CIL) to output two waypoints encoded by distance and relative angle, conditioned on high-level commands (left/straight/right).
Policy is trained on segmentation outputs that include realistic noise (no ground-truth segmentation) to mimic real perception imperfections.
Training uses simulation data (28 hours) with an expert planner and a PID follower; data augmentation and weather variability are applied.
Control uses separate PID controllers for throttle and steering based on waypoint angles (φ1) and target speeds.]
research_questions:[
Can a modular perception-policy-control architecture enable direct sim-to-real transfer of driving policies without fine-tuning?
Does abstracting perception to semantic segmentation and driving to waypoint outputs improve generalization across environments and weather conditions?
How does training with noisy segmentation outputs affect real-world transfer performance?
What is the comparative performance of modular, waypoint-based policies versus end-to-end, image-based policies under domain shift?]
key_findings:[
The modular approach outperforms monolithic end-to-end baselines in simulation under unseen towns and weather conditions.
In simulation, waypoint-based predictions from segmentation generalize better to new towns and weather than image-to-control or image-to-waypoint baselines.
In the real world, the modular policy achieves 82% success without data augmentation and 100% with augmentation across three routes on a 1/5-scale truck.
End-to-end image-based policies trained on color images fail to generalize well to the real world even with augmentation or domain randomization.
The real-robot experiments demonstrate transfer from simulation to reality without finetuning, with only Cityscapes data used for perception training.
The physical vehicle completed all three long routes with only a few infractions, including one severe infraction requiring intervention.]
table_headers:[
Route
Length
Time
Missed turns
Severe
Mild

실험 결과

연구 질문

RQ1 ,
RQ2Can a modular perception-policy-control architecture enable direct sim-to-real transfer of driving policies without fine-tuning?
RQ3Does abstracting perception to semantic segmentation and driving to waypoint outputs improve generalization across environments and weather conditions?
RQ4How does training with noisy segmentation outputs affect real-world transfer performance?
RQ5What is the comparative performance of modular, waypoint-based policies versus end-to-end, image-based policies under domain shift?]
RQ6table_rows:[[
RQ71
RQ81.0 km
RQ94:12
RQ101/7
RQ110
RQ122

주요 결과

Route	Length	Time	Missed turns	Severe	Mild
1	1.0 km	4:12	1/7	0	2
2	0.7 km	3:05	1/8	0	3
3	1.1 km	5:08	2/8	1	5

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.