QUICK REVIEW

[논문 리뷰] Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

Zhejun Zhang, Alexander Liniger|arXiv (Cornell University)|2023. 10. 19.

Human Pose and Action Recognition인용 수 8

한 줄 요약

HPTR은 Knarpe attention과 이질적 폴리라인 표현을 도입하여 공유/정적 컨텍스트를 가진 실시간, 확장 가능한 모션 예측을 가능하게 하며, 경쟁력 있는 정확도를 달성하는 동시에 지연 시간과 메모리 사용을 크게 줄입니다.

ABSTRACT

The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods. Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling. The code is available at https://github.com/zhejz/HPTR.

연구 동기 및 목표

온보드 자율주행 시스템을 위한 실시간 모션 예측의 필요성을 제시한다.
에이전트 간 컨텍스트를 공유하는 확장 가능한 에이전트에 독립적인 표현을 제안한다.
온라인 계산을 최소화하기 위해 비동기적이고 계층적인 업데이트를 갖는 Transformer 기반 아키텍처(HPTR)를 개발한다.

제안 방법

모든 입력을 전역 포즈와 로컬 속성을 갖는 이질적 폴리라인으로 표현한다.
Knarpe 도입: 상대 포즈 인코딩을 가진 K-최근접 이웃 어텐션으로 Transformer에서 쌍간 상대 표현을 가능하게 한다.
계층적 Transformer 프레임워크인 HPTR를 구축하고, 클래스 내/클래스 간 어텐션과 비동기 토큰 업데이트를 통해 정적 컨텍스트를 재사용한다.
출력을 다중 모드 궤적을 위한 가우시안 혼합으로 디코딩하고, 신뢰도(confidence), 위치(position), 요(yaw), 속도(speed) 등의 손실 항의 조합으로 학습한다.
지도(map), 신호등(traffic lights), 및 에이전트를 폴리라인으로 표현하고 온라인 추론 중 정적 지도 특징을 재사용하여 효율성을 높인다.

실험 결과

연구 질문

RQ1Knarpe가 모션 예측을 위한 쌍간 상대 폴리라인의 효과적인 Transformer 기반 처리를 가능하게 할 수 있는가?
RQ2HPTR이 엔드투엔드로 경쟁력 있는 정확도 달성하면서 장면 중심 방법의 효율성과 일치하거나 이를 능가할 수 있는가?
RQ3온라인 추론에서 컨텍스트 공유와 비동기 토큰 업데이트를 통해 메모리 및 지연 시간에서 어떤 이점을 얻을 수 있는가?
RQ4Waymo Open Motion, Argoverse-2와 같은 대규모 데이터셋에서 후처리나 모델 앙상블 없이 최첨단 방법과 비교하여 HPTR의 성능은 어떠한가?

주요 결과

HPTR은 최신 에이전트 중심 방법과 경쟁력 있는 성능을 달성하면서도 베이스라인 대비 메모리와 지연을 크게 감소시킨다(약 80% 감소까지).
온라인 추론 중 정적 지도 특징을 캐싱하면 단일 GPU에서 40fps로 64개 에이전트에 대해 실시간 예측이 가능하다.
HPTR은 씬 중심 베이스라인을 능가하고 에이전트 중심 접근법에 거의 매칭되며, 특히 하삼각 어텐션 배열을 사용할 때 두드러진다.
Waymo 및 Argoverse-2 벤치마크에서 HPTR은 비용이 많이 드는 후처리나 모델 앙상블 없이 최상위 엔드투엔드 방법 중 하나로 랭크된다.
제안된 Knarpe attention은 이질적 폴리라인 간에 효율적인 컨텍스트 공유를 가능하게 하여 밀집한 교통 상황에서 확장성을 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.