QUICK REVIEW

[论文解读] Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction

Cunjun Yu, Xiao Ma|arXiv (Cornell University)|May 18, 2020

Anomaly Detection Techniques and Applications参考文献 51被引用 35

一句话总结

STAR 使用交错的时空 Transformer，结合 TGConv 图卷积和外部存储，通过注意力机制在五个数据集上实现最先进的行人轨迹预测。

ABSTRACT

Understanding crowd motion dynamics is critical to real-world applications, e.g., surveillance systems and autonomous driving. This is challenging because it requires effectively modeling the socially aware crowd spatial interaction and complex temporal dependencies. We believe attention is the most important factor for trajectory prediction. In this paper, we present STAR, a Spatio-Temporal grAph tRansformer framework, which tackles trajectory prediction by only attention mechanisms. STAR models intra-graph crowd interaction by TGConv, a novel Transformer-based graph convolution mechanism. The inter-graph temporal dependencies are modeled by separate temporal Transformers. STAR captures complex spatio-temporal interactions by interleaving between spatial and temporal Transformers. To calibrate the temporal prediction for the long-lasting effect of disappeared pedestrians, we introduce a read-writable external memory module, consistently being updated by the temporal Transformer. We show that with only attention mechanism, STAR achieves state-of-the-art performance on 5 commonly used real-world pedestrian prediction datasets.

研究动机与目标

在拥挤场景中激励准确的行人轨迹预测。
用基于注意力的机制建模社交交互和时间依赖性。
提出用于空间建模的基于 Transformer 的图卷积 (TGConv)。
通过交错的时空 Transformer 捕捉时空动力学。
通过一个可读写的外部图内存提高时间步的一致性。

提出的方法

引入 TGConv，一种基于 Transformer 的图卷积，用于建模空间交互。
应用一个时态 Transformer 来学习每个行人的时序依赖性。
将空间和时间 Transformer 交错，以捕捉耦合的时空动态。
添加一个可读写的外部内存，在时间步间平滑时序嵌入。
使用两个编码器块和一个简单解码器来预测未来轨迹。
端到端训练，使用 Adam 优化器和预定义的超参数，并在 ADE/FDE 指标上进行评估。

实验结果

研究问题

RQ1注意力为基础的 STAR 模型是否能够在标准数据集上超越最先进的社会轨迹预测器？
RQ2交错的空间与时间 Transformer 是否比分开处理的模型能提供更好的时空建模？
RQ3TGConv 是否在空间交互建模方面优于传统的图卷积（GCN/GAT）？
RQ4外部图内存是否能提升时间一致性和预测准确性？

主要发现

STAR-D（确定性）在多个数据集上优于强基线；STAR（随机性）在随机采样下达到最先进的性能。
TGConv（基于 Transformer 的图卷积）在空间交互建模方面优于 GCN/GAT 等替代方法，尤其在密集人群场景。
交错两个编码器（先空间后时间，或相反）通常比单一编码器提供更好的时空表征。
时态 Transformer 在轨迹预测中优于基于 LSTM 的时序建模。
外部图内存提供更平滑的时序嵌入，并在总体性能上有所提升，特别是在某些数据集上。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。