QUICK REVIEW

[論文レビュー] Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Vineet Kosaraju, Amir Sadeghian|arXiv (Cornell University)|Jul 4, 2019

Autonomous Vehicle Technology and Safety参考文献 36被引用数 113

ひとこと要約

Social-BiGATは、グラフ注意ネットワークを用いて社会的相互作用をモデル化し、Bicycle-GANに触発された潜在エンコーダで複数のもっともらしい未来を捉える、マルチモーダルな歩行者軌道予測モデルを導入します。社会的・物理的手がかりをデュアル識別器で同時にモデル化することで標準ベンチマークで最先端の結果を達成します。

ABSTRACT

Predicting the future trajectories of multiple interacting agents in a scene has become an increasingly important problem for many different applications ranging from control of autonomous vehicles and social robots to security and surveillance. This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene. While the existing literature has explored some of these cues, they mainly ignored the multimodal nature of each human's future trajectory. In this paper, we present Social-BiGAT, a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by better modelling the social interactions of pedestrians in a scene. Our method is based on a graph attention network (GAT) that learns reliable feature representations that encode the social interactions between humans in the scene, and a recurrent encoder-decoder architecture that is trained adversarially to predict, based on the features, the humans' paths. We explicitly account for the multimodal nature of the prediction problem by forming a reversible transformation between each scene and its latent noise vector, as in Bicycle-GAN. We show that our framework achieves state-of-the-art performance comparing it to several baselines on existing trajectory forecasting benchmarks.

研究の動機と目的

自動運転システムとソーシャルロボティクスのための正確な歩行者軌道予測を促進する。
多モーダルな未来を捉えるために、社会的相互作用とシーン文脈を統合フレームワークでモデリングする。
グラフ注意ネットワークを活用して歩行者間のニュアンスのある相互作用を学習する。
潜在空間エンコーダを組み込んで多様でマルチモーダルな軌道生成を可能にする。
標準ベンチマークと比較して最先端の性能を示す。

提案手法

特徴エンコーダ、グラフ注意ベースの社会的エンコーダ、そして未来の軌道を生成するデコーダLSTMを備えた生成器を構築する。
現実感を担保するために、局所（歩行者）およびグローバル（シーン）スケールで動作する2つの識別器を使用する。
潜在ノイズと生成軌道間の全射写像を確立しマルチモーダリティを促進するため、Bicycle-GAN に触発された潜在エンコーダを採用する。
シーン特徴にソフトアテンションを適用し、全歩行者に対してグラフ注意を適用して社会的・物理的手がかりを捉える。
敵対的損失と再構成損失およびKL正則化を組み合わせて潜在空間を生成軌道と整合させる。
ETH/UCYベンチマークでADE/FDEを評価し、異なるサンプリング設定（K）での分散を分析する。

実験結果

リサーチクエスチョン

RQ1グラフ注意ネットワークは、シーン内の歩行者間のグローバルな社会的相互作用を軌道予測で効果的に捉えることができるか？
RQ2Bicycle-GANに触発された潜在エンコーダの統合は、多様で現実的なマルチモーダルな将来の軌道を生み出すか？
RQ3局所およびグローバルのデュアル識別器は、生成軌道の現実性と一貫性をデータセット全体で向上させるか？
RQ4標準ベンチマークにおいて、従来の識別的および生成的ベースラインと比較してSocial-BiGATの性能はどうか？
RQ5歩行者のグループの妥当な将来経路の補間における潜在空間の影響は何か？

主な発見

データセット	Lin	S-LSTM	S-GAN-P	Sophie	GAT	BiGAN	Social-BiGAT
ETH	1.33 / 2.94	1.09 / 2.35	0.87 / 1.62	0.70 / 1.43	0.68 / 1.29	0.72 / 1.47	0.69 / 1.29
HOTEL	0.39 / 0.72	0.79 / 1.76	0.67 / 1.37	0.76 / 1.67	0.68 / 1.40	0.54 / 1.12	0.49 / 1.01
UNIV	0.82 / 1.59	0.67 / 1.40	0.76 / 1.52	0.54 / 1.24	0.57 / 1.29	0.55 / 1.34	0.55 / 1.32
ZARA1	0.62 / 1.21	0.47 / 1.00	0.35 / 0.68	0.30 / 0.63	0.29 / 0.60	0.32 / 0.65	0.30 / 0.62
ZARA2	0.77 / 1.48	0.56 / 1.17	0.42 / 0.84	0.38 / 0.78	0.37 / 0.75	0.49 / 0.88	0.36 / 0.75
AVG	0.79 / 1.59	0.72 / 1.54	0.61 / 1.21	0.54 / 1.15	0.52 / 1.07	0.52 / 1.09	0.48 / 1.00

Social-BiGATは評価対象モデルの中で最良の性能を達成し、従来の最先端と比較してシーン全体で平均FDEを0.15m削減した。
GATのみで以前のモデルを上回る性能を示し、GATなしのBiGANコンポーネントは効果がなく、組み合わせが最も強力な結果をもたらす。
Table 1では、Social-BiGATはETHで0.69/1.29、HOTELで0.49/1.01、UNIVで0.55/1.32、ZARA1で0.30/0.62、ZARA2で0.36/0.75、データセット全体の平均0.48/1.00のADE/FDEを達成し、ベースラインを上回る。
Table 2は、サンプル数Kが減少すると、Social-BiGATの性能がS-GAN-PおよびSophieよりも緩やかに劣化することを示しており、潜在エンコーダによるより良い一般化と出力分散の低減を示唆している。
定性的結果は、Social-BiGATがS-GAN-PおよびSophieと比較して予測分散が低く、群衆の相互作用および衝突回避の処理が改善されていることを示している。
潜在zの変動の可視化は、回避 vs. 攻撃性、速度の変動といった解釈可能なモード変化を示し、モデルのマルチモーダル能力を裏付けている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。