QUICK REVIEW

[논문 리뷰] Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

Zhong Juan, Zheng Liu|arXiv (Cornell University)|2023. 04. 21.

Advanced Neural Network Applications인용 수 9

한 줄 요약

자율 주행용 Transformer 기반 모델에 대한 포괄적 고찰로, 모델 아키텍처, 작업(3D/2D 인지, 예측, 엔드-투-엔드 계획), 휴대용 디바이스에서의 연산자 수준 하드웨어 가속에 초점.

ABSTRACT

Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications.

연구 동기 및 목표

Transformers 아키텍처가 자율 주행 작업(인지, 매핑, 예측, 계획)에 어떻게 적용되는지 평가한다.
실시간 배치를 위해 인코더-디코더 및 인코더-전용 설계의 트레이드오프를 분석한다.
양자화, 고정소수점 산술, 아키텍처별 최적화 등 연산자 수준 하드웨어 가속 측면을 조사한다.
표준 데이터세트에서 Transformer 모델 벤치마크를 수행하여 정확도, 속도, 자원 사용과 배치 가능성 간의 관계를 제시한다.

제안 방법

주행 작업(3D 일반 인지, 2D/평면, 예측 및 엔드-투-엔드)을 기준으로 Transformer 기반 모델을 검토하고 분류한다.
인코더-디코더 대 인코더-전용 구조 및 입력/출력 표현(BEV, 2D/3D 쿼리 등)의 차이를 비교한다.
소프트맥스, 레이어 정규화, 매트릭스 곱셈 등 연산자 수준 구성요소와 하드웨어 가속 전략을 요약한다.
Nvidia GTX-3090 하드웨어에서 데이터세트별 모델 크기, FLOPs, FPS, 정확도에 관한 벤치마크 표를 제공한다.
장기 자율 주행 응용 분야를 위한 하드웨어 배치의 도전과제와 트렌드를 논의한다.

실험 결과

연구 질문

RQ13D 인지, 차선/HD맵 작업, 엔드-투-엔드 자율 주행 파이프라인에서 어떤 Transformer 기반 아키텍처가 가장 효과적인가?
RQ2인코더-디코더와 인코더-전용 구성이 자동차 시나리오에서 정확도, 지연, 자원 사용에 어떤 차이가 있는가?
RQ3주요 연산자 수준 병목(예: 소프트맥스, LN, FFN, 매트멈)과 이를 하드웨어 가속이 어떻게 해결할 수 있는가?
RQ4휴대용 하드웨어에서 NuScenes, OpenLane, TuSimple, CARLA와 같은 대표 데이터세트의 벤치마크 성능은 현재 어느 수준인가?
RQ5장기 자율 주행에서 Transformer 기반 모델의 배치를 형성하는 트렌드와 도전과제는 무엇인가?

주요 결과

Transformer 기반 모델은 CNN 기준선과 비교해 객체 탐지, 차선 탐지, HD 맵 생성에서 경쟁력 있거나 우수한 성능을 보이는 경우가 많다.
BEV 기반 쿼리와 다중 시야 융합은 3D 인지 성능을 향상시키며, NuScenes에서 BEVFormer 및 PETR-유형 접근 방식이 강력한 결과를 보여준다.
엔드-투-엔드 Transformer 구성(예: TransFuser, InterFuser, UniAD)은 sensing-제어 계획의 통합에 잠재력을 보이나, 현실성 및 데이터 요구량은 여전히 과제로 남아 있다.
GTX-3090에서의 하드웨어 벤치마크는 모델 크기, FLOPs, 프레임 속도 간의 트레이드오프를 보여주며, 효율적인 인코더/디코더 설계와 가속 연산자의 필요성을 강조한다.
향후 추세는 엣지 디바이스에서의 효율성과 정확도 균형을 맞추기 위해 계층적/이동 윈도우 Transformer(Swin-Transformer) 및 다중 모달 융합의 활용이 늘어날 것이라는 점이다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.