QUICK REVIEW

[論文レビュー] VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

Jiyang Gao, Chen Sun|arXiv (Cornell University)|May 8, 2020

Autonomous Vehicle Technology and Safety参考文献 29被引用数 44

ひとこと要約

VectorNetは階層的グラフニューラルネットワークを用いてベクトル化されたHDマップとエージェントの軌跡を挙動予測にエンコードし、ラスター化ConvNetベースラインと比較してはるかに少ないパラメータ数とFLOPsで競争力のある、あるいはそれを上回る結果を達成し、Argoverseで最先端を示している。

ABSTRACT

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

研究の動機と目的

エージェントのダイナミクスをHDマップの構造化コンテキストとベクトル表現で統合する。
局所ポリライン相互作用とグローバルポリライン-ノード関係を捉える階層グラフアーキテクチャを開発する。
コンテキスト学習を改善する自己教師付きグラフ補完目的を導入する。
社内データセットとArgoverseデータセットで評価し、ラスター化レンダリングベースラインと最先端を比較する。

提案手法

マップとエージェント軌跡をベクトルの列（ポリライン）として表現し、各ベクトルを開始点・終点座標および属性を含む特徴を持つグラフノードとして扱う。
同じポリライン内のベクトルを接続し、ローカルGNN（MLPベース）と最大プーリングを介してポリライン特徴量へ集約することでポリラインサブグラフを構築する。
ポリライン特徴量上で自己注意型GNNを用いたグローバルグラフで高次の相互作用をモデル化し、対応するポリラインノード特徴量からターゲットエージェントの未来をデコードする。
ノード特徴量をマスクしてネットワークに再構成させるグラフ補完の補助タスクを導入し、文脈意識のある表現を促進する。
多タスクロスを最適化する：L = L_traj（将来の軌道の負のガウス対数尤度）＋ α L_node（マスクされたノード特徴再構成のHuber損失）。
安定した予測のためにポリラインノード特徴を単位スケールに正規化し、ターゲット車両の進行方向に合わせて座標を回転する。

実験結果

リサーチクエスチョン

RQ1HDマップのベクトル表現とエージェント軌跡を階層グラフで効果的に学習して将来の挙動を予測できるか。
RQ2局所的に接続されたポリラインサブグラフとグローバルアテンションベースのグラフがラスター化ConvNetベースラインと比べて軌道予測を改善するか。
RQ3グラフ補完補助タスクはエージェントとマップコンテキスト間の相互作用のモデリングを改善するか。

主な発見

VectorNetはラスター化ConvNetベースラインと同等またはそれ以上の性能を示しつつ、はるかに少ないパラメータ（約72K対約246K）と桁違いに少ないFLOPsで実現する。
VectorNetはArgoverseテストセットでDE@3sの最先端を示し、Argoverseの最良ConvNetベースラインを顕著に上回る。
社内データセットでは、VectorNetのベクトル化入力がラスター化ベースラインの性能と同等かそれ以上を、計算量（FLOPs）とパラメータの比率を小さくして実現する。
マブリと他エージェントの軌跡の両方を組み込むアブレーションは精度を向上させ、特に長期的な予測でグラフ補完補助タスクが継続的に効果を発揮する。
ポリラインサブグラフの深さ（三層）と単一層のグローバルグラフが、精度と効率の最適なトレードオフを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。