QUICK REVIEW

[논문 리뷰] Single-Eye View: Monocular Real-time Perception Package for Autonomous Driving

Haixi Zhang, Aiyinsi Zuo|arXiv (Cornell University)|2026. 03. 22.

Advanced Vision and Imaging인용 수 0

한 줄 요약

LRHPerception은 단일 카메라에서 29 FPS로 실시간 모노큘러 인식 패키지로, 엔드-투-엔드 효율성과 로컬 매핑 상세 정보를 결합하여 RGB, 도로 분할, 깊이, 객체 추적, 및 궤적 예측을 제공합니다.

ABSTRACT

Amidst the rapid advancement of camera-based autonomous driving technology, effectiveness is often prioritized with limited attention to computational efficiency. To address this issue, this paper introduces LRHPerception, a real-time monocular perception package for autonomous driving that uses single-view camera video to interpret the surrounding environment. The proposed system combines the computational efficiency of end-to-end learning with the rich representational detail of local mapping methodologies. With significant improvements in object tracking and prediction, road segmentation, and depth estimation integrated into a unified framework, LRHPerception processes monocular image data into a five-channel tensor consisting of RGB, road segmentation, and pixel-level depth estimation, augmented with object detection and trajectory prediction. Experimental results demonstrate strong performance, achieving real-time processing at 29 FPS on a single GPU, representing a 555% speedup over the fastest mapping-based approach.

연구 동기 및 목표

표준 하드웨어에서 비용 효율적인 모노큘러 자율주행 인식을 고무한다.
단일 카메라에서 객체 추적, 궤적 예측, 도로 분할, 깊이 추정을 통합하기 위해 LRHPerception을 제안한다.
중복 처리를 줄이기 위해 모듈 간 정보 공유를 가능하게 한다.
최첨단 로컬 매핑 방식에 비해 실시간 성능 향상을 보여준다.

제안 방법

RGB 입력에서 멀티 스케일 특징을 추출하기 위해 Swin Transformer 백본을 사용한다.
하나의 백본으로 네 가지 작업을 계산하기 위해 모듈 간 백본 특징을 공유한다.
데이터 연관을 개선하기 위해 카메라 모션 인식 객체 추적용 C-BYTE를 도입한다.
다중 모달 미래를 위한 GRU 인코더/디코더를 갖춘 CVAE 기반 궤적 예측기를 고용한다.
Phi_8 특징을 활용하는 단순화된 U-Net 기반의 경량 도로 분할 블록을 구현한다.
거친-정밀(depth) 추정기로 거친 깊이 포머와 정제 깊이 포머를 갖춘 코스-투-파인 깊이 추정기를 채택한다.
모듈별 손실을 L = λ_det L_det + λ_seg L_seg + λ_depth L_depth + λ_traj L_traj로 결합한 교차-데이터세트 학습과 함께 여러 데이터세트에서 모듈을 훈련한다.

Figure 1 : Innovation and architecture blueprint a) Paradigm of end-to-end solution b) Paradigm of camera-fusion for local map solution c) Paradigm of our LRHPerception package, extracts essences from monocular camera for cost-info trade-off.

실험 결과

연구 질문

RQ1표준 하드웨어에서 모노큘러 LRHPerception이 실시간(FPS) 성능을 달성하면서 추적, 궤적, 분할, 깊이 등 각 과제에서 경쟁력 있는 인식 정확도를 유지할 수 있는가?
RQ2단일 백본과 통합 아키텍처를 공유하는 것이 직렬 작업 파이프라인에 비해 중복 계산을 줄이는가?
RQ3카메라 모션 보정(C-BYTE)과 다중 작업 통합이 추적 견고성과 궤적 예측 정확도에 어떤 영향을 미치는가?
RQ4제안된 경량 블록과 거친-정밀 깊이 설계를 사용할 때 도로 분할 및 깊이 추정에서 속도와 정확도의 이득은 무엇인가?
RQ5교차 데이터세트 학습이 모노큘러 인식을 위한 작업 비특정 백본을 공동 최적화하는 데 효과적인가?

주요 결과

LRHPerception은 모노큘러 인식에 대해 단일 RTX 3090 GPU에서 29 FPS를 달성한다.
해당 방법은 가장 빠른 로컬 매핑 방법에 비해 555%의 가속을 보인다.
C-BYTE는 연관에서 카메라 모션을 보정하여 객체 추적 강인성을 향상시키고, MOTA/IDF1/IDP를 거의 지연 없이 개선한다(대략 <4 ms).
CVAE 기반 인코더와 GRU 기반 디코더를 통한 궤적 예측은 JAAD 및 PIE 데이터세트에서 속도와 정확도 면에서 최근 방법들보다 뛰어난 성능으로 더 빠른 처리 속도를 보인다.
Phi_8 특징의 경량 U-Net 스타일 블록으로 도로 분할은 높은 mIOU를 달성하고 범용 분할 모델에 비해 속도가 우수하다.
수정된 C2f 계층을 사용하는 거친-정밀 설계의 깊이 추정은 정확도를 유지하면서 상당한 속도 향상을 제공한다(예: 선도적 대안 대비 577% 더 빠름).

Figure 2 : Granular Model Structure.1 Design of convolution decoder, object tracking, trajectory prediction, and depth estimation; magnify for details. BTAE mechanism in Algorithm 1. Remaining components are shown in Fig. 3.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.