QUICK REVIEW

[论文解读] Human Trajectory Prediction using Spatially aware Deep Attention Models

Daksh Varshneya, G. Srinivasaraghavan|arXiv (Cornell University)|May 26, 2017

Video Surveillance and Tracking Methods参考文献 22被引用 77

一句话总结

本文提出一个端到端的空间感知深度注意模型，用于通过联合建模动态交互和静态场景上下文来预测人体轨迹，包含用于静态上下文的 SSCN，以及一个时空注意力编码-解码器。

ABSTRACT

Trajectory Prediction of dynamic objects is a widely studied topic in the field of artificial intelligence. Thanks to a large number of applications like predicting abnormal events, navigation system for the blind, etc. there have been many approaches to attempt learning patterns of motion directly from data using a wide variety of techniques ranging from hand-crafted features to sophisticated deep learning models for unsupervised feature learning. All these approaches have been limited by problems like inefficient features in the case of hand crafted features, large error propagation across the predicted trajectory and no information of static artefacts around the dynamic moving objects. We propose an end to end deep learning model to learn the motion patterns of humans using different navigational modes directly from data using the much popular sequence to sequence model coupled with a soft attention mechanism. We also propose a novel approach to model the static artefacts in a scene and using these to predict the dynamic trajectories. The proposed method, tested on trajectories of pedestrians, consistently outperforms previously proposed state of the art approaches on a variety of large scale data sets. We also show how our architecture can be naturally extended to handle multiple modes of movement (say pedestrians, skaters, bikers and buses) simultaneously.

研究动机与目标

在拥挤场景中使用多种导航模式捕捉并预测人体轨迹。
同时整合与邻近主体的动态交互和主体周围的静态场景上下文。
提出一个端到端架构，结合空间上下文网络和注意力机制，以提升长期规划。
将框架扩展到处理超过行人的多种移动主体类别。

提出的方法

引入 Spatially Static Context Network (SSCN) 来建模主体周围的静态空间上下文。
开发一个将动态社交上下文与静态上下文张量结合的池化机制。
使用嵌入位置、动态上下文和静态上下文的时空注意力编码-解码器，采用 Bahdanau 风格的注意力。
通过参数化的双变量高斯分布预测下一个位置。
用负对数似然进行训练，以联合优化所有主体类型模型。
展示两个变体：D-ATT 使用动态池化，SD-ATT 增加静态上下文池化。

实验结果

研究问题

RQ1静态场景上下文如何被纳入到行人轨迹预测中？
RQ2将动态社交池化与静态上下文结合是否比仅动态模型提高预测准确性？
RQ3模型是否可以扩展到多种移动对象类别超越行人？
RQ4时空注意力对长期轨迹规划的影响是什么？

主要发现

Dataset	O-LSTM	S-LSTM	D-ATT	SD-ATT
Avg. Disp. Error	ETH [11]	0.49	0.50	0.47	-
HOTEL [11]	0.09	0.11	0.12	-
ZARA1 [12]	0.22	0.22	0.18	-
GATES1 [13]	0.16	0.12	0.11	0.09
GATES2 [13]	0.15	0.17	0.14	0.10
GATES3 [13]	0.18	0.16	0.13	0.13
Final Disp. Error	ETH [11]	1.06	1.07	0.85	-
HOTEL [11]	0.20	0.23	0.19	-
ZARA1 [12]	0.46	0.46	0.48	-
GATES1 [13]	0.28	0.25	0.19	0.17
GATES2 [13]	0.40	0.37	0.38	0.35
GATES3 [13]	0.26	0.26	0.25	0.24

所提出的 SD-ATT 模型在 ETH、HOTEL、ZARA1、GATES1、GATES2、GATES3 数据集上相较 S-LSTM 和 O-LSTM，在平均位移误差和最终位移误差方面均具优势。
SD-ATT 模型在 Stanford Drone 数据集上仍然有效，预测优于 Social LSTM。
基于 SSC N 的静态上下文提供了具有语义意义的可达性地图，影响轨迹规划。
定性结果展示非线性轨迹、避免碰撞，以及由于静态上下文池化与注意力机制对静态障碍物的更好处理。
两个变体表明添加静态上下文（SD-ATT）相比仅动态上下文（D-ATT）有提升。
该方法支持扩展到多种对象类别的情景，不仅限于行人。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。