QUICK REVIEW

[論文レビュー] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

Bencheng Liao, Shaoyu Chen|arXiv (Cornell University)|Aug 30, 2022

Advanced Image and Video Retrieval Techniques被引用数 69

ひとこと要約

MapTRは、順列同値モデリングと階層的Transformerフレームワークを導入し、カメラ入力からオンラインのベクトル化HDマップを生成し、nuScenesでリアルタイム速度と最先端の精度を達成する。

ABSTRACT

High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. We present MapTR, a structured end-to-end Transformer for efficient online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. MapTR achieves the best performance and efficiency with only camera input among existing vectorized map construction approaches on nuScenes dataset. In particular, MapTR-nano runs at real-time inference speed ($25.1$ FPS) on RTX 3090, $8 imes$ faster than the existing state-of-the-art camera-based method while achieving $5.0$ higher mAP. Even compared with the existing state-of-the-art multi-modality method, MapTR-nano achieves $0.7$ higher mAP, and MapTR-tiny achieves $13.5$ higher mAP and $3 imes$ faster inference speed. Abundant qualitative results show that MapTR maintains stable and robust map construction quality in complex and various driving scenes. MapTR is of great application value in autonomous driving. Code and more demos are available at \url{https://github.com/hustvl/MapTR}.

研究の動機と目的

自動運転のためのオンラインベクトル化HDマップ構築を動機付ける。
マップ要素の形状の曖昧さを解消するための順列等価モデリングを提案。
エンドツーエンド学習のための階層的クエリと二部マッチングフレームワークを開発。
エンコーダ-デコーダアーキテクチャでリアルタイムまたはほぼリアルタイムの性能を実現。
nuScenesの多様な走行シーンに対する頑健性を示す。

提案手法

地物を一群の等価な置換を持つ点集合としてモデリングし、幾何学的曖昧さを解消。
階層的クエリエンベディングを用いて、インスタンスレベルとポイントレベルの情報を単一のTransformerフレームワークでエンコード。
階層的二部マッチングを採用：インスタンスレベルのマッチングに続いてポイントレベルのマッチング。
分類（Focal）、点対点距離（Manhattan distance）、エッジ方向（コサイン類似度）損失を含む多項ロスで訓練。
エンドツーエンドのパイプラインで柔軟な2D-to-BEV変換（例：GKT、IPM、LSS、Deformable Attention）を備えたBEVベースのマップエンコーダを統合。

実験結果

リサーチクエスチョン

RQ1任意の形状を持つ要素を持つベクトル化HDマップ構築に、DETRに類似したエンドツーエンドフレームワークを適用できるか。
RQ2順列等価モデリングは、ポリゴン形状やポリライン形状を持つマップ要素の学習を安定化し精度を向上させるか。
RQ3階層的マッチングとクエリは、インスタンスレベルおよびポイントレベルのマップ要素の学習にどのように影響するか。
RQ4nuScenesでのカメラのみのオンラインHDマップ構築における精度と速度の間の性能トレードオフは何か。

主な発見

手法	モダリティ	バックボーン	エポック数	AP_ped	AP_divider	AP_boundary	mAP	FPS
MapTR-nano	C	R18	110	39.6	49.9	48.2	45.9	25.1
MapTR-tiny	C	R50	24	46.3	51.5	53.1	50.3	11.2
MapTR-tiny	C	R50	110	56.2	59.8	60.1	58.7	11.2
HDMapNet	C	Effi-B0	30	14.4	21.7	33.0	23.0	0.8
HDMapNet	L	PointPillars	30	10.4	24.1	37.9	24.1	1.0
HDMapNet	C&L	Effi-B0 & PointPillars	30	16.3	29.6	46.7	31.0	0.5
VectorMapNet	C	R50	110	37.6	50.5	47.5	45.2	2.9
VectorMapNet	L	PointPillars	110	25.7	37.6	38.6	34.0	-

MapTR-nanoはRTX 3090で25.1 FPSを発揮し、SOTAのカメラベース手法より5.0ポイント高いmAPとリアルタイム性能を達成。
MapTR-tinyはnuScenesでマルチモーダルSOTAより13.5ポイント高いmAPと推論を3倍速く達成。
順列等価モデリングは固定順序の点集合モデリングよりmAPを5.9ポイント向上させ、歩行者横断のAPで最大11.9ポイントの利得をもたらす。
エッジ方向損失と階層的マッチングスキームは学習を安定化させ、マップ要素の幾何学的忠実度を向上させる。
MapTRは多様な走行シーンでも頑健で、カメラのみの入力で安定したベクトル化HDマップを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。