QUICK REVIEW

[論文レビュー] Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Vineet Kosaraju, Amir Sadeghian|arXiv (Cornell University)|Jul 4, 2019

Autonomous Vehicle Technology and Safety被引用数 277

ひとこと要約

Social-BiGAT はグラフ注意ネットワークと Bicycle-GAN に inspired latent encoding を用いて、マルチモーダルで社会的かつ物理的に妥当な歩行者軌道予測を生成し、標準ベンチマークにおいて従来手法を上回る。

ABSTRACT

Predicting the future trajectories of multiple interacting agents in a scene has become an increasingly important problem for many different applications ranging from control of autonomous vehicles and social robots to security and surveillance. This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene. While the existing literature has explored some of these cues, they mainly ignored the multimodal nature of each human's future trajectory. In this paper, we present Social-BiGAT, a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by better modelling the social interactions of pedestrians in a scene. Our method is based on a graph attention network (GAT) that learns reliable feature representations that encode the social interactions between humans in the scene, and a recurrent encoder-decoder architecture that is trained adversarially to predict, based on the features, the humans' paths. We explicitly account for the multimodal nature of the prediction problem by forming a reversible transformation between each scene and its latent noise vector, as in Bicycle-GAN. We show that our framework achieves state-of-the-art performance comparing it to several baselines on existing trajectory forecasting benchmarks.

研究の動機と目的

自動運転システムとソーシャルロボットのために、正確でマルチモーダルな歩行者軌道予測を動機づける。
豊富な社会的相互作用とシーン文脈をモデル化して予測の現実性を向上させる。
軌道と潜在ノイズの双方向写像を導入してマルチモーダル性を捉える。
注意機構を介して物理的シーンの手掛かりを組み込み、一般化を強化する。
標準的な軌道データセットで既存のベースラインと比較評価する。

提案手法

pedestrians を全結合グラフのノードとして表現し、社会的相互作用を学習するために Graph Attention Networks を適用する。
過去の歩行者軌道とシーン文脈を潜在特徴にエンコードする。
Bicycle-GAN に着想を得た潜在エンコーダを用いてノイズと軌道の双射を作り、マルチモーダル出力を実現する。
デコーダ LSTM を、連結された歩行者・社会・物理文脈特徴と潜在ノイズを条件として未来の軌道を生成する。
局所的なペダストリアンとグローバルなシーンの二つの識別器でリアリズムを多層で強制する。
敵対的損失、ノイズの再構成損失 (Lz)、軌道再構成損失 (Ltraj)、およびガウス潜在分布と一致させる KL ダイバージェンスを用いて最適化する。

実験結果

リサーチクエスチョン

RQ1Social-BiGAT はグローバルな社会的相互作用をモデリングしつつ、歩行者軌道のマルチモーダル分布を学習できるか。
RQ2社会的手掛かりのグラフ注意と Bicycle-GAN スタイルの潜在エンコーディングの組み合わせは、従来の GAN ベースアプローチよりもマルチモーダルな軌道生成を改善するか。
RQ3局所・グローバル識別器は軌道の現実性と多様性にどのような影響を与えるか。
RQ4シーン文脈をソフト注意で組み込むことは、さまざまなシーンで予測精度を向上させるか。

主な発見

Social-BiGAT はテストされたモデルの中で最高の性能を達成し、 prior state-of-the-art と比べて平均最終変位誤差 (FDE) を平均で 0.15 m 減少させた。
Graph Attention Networks (GAT) の導入により、 global social interactions をモデル化しないベースラインより性能が向上する。
GAT と潜在エンコーダ (BiGAN) の組み合わせが最も強力で、マルチモーダル生成の利点を強調する。
潜在空間のモデリングは、サンプル数が少ない場合により堅牢な予測を提供し、ADE/FDE の変動の増加を抑制する。
定性的な結果は、Social-BiGAT が混雑や衝突回避シナリオで低分散でより現実的な軌道を生成することを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。