QUICK REVIEW

[论文解读] SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang|arXiv (Cornell University)|Sep 26, 2023

Autonomous Vehicle Technology and Safety被引用 10

一句话总结

SEPT 在场景输入上使用三种自监督掩码任务来预训练场景编码器，然后对运动预测进行微调，在紧凑架构下在 Argoverse 1 和 2 上实现了最先进的结果。

ABSTRACT

Motion prediction is crucial for autonomous vehicles to operate safely in complex traffic environments. Extracting effective spatiotemporal relationships among traffic elements is key to accurate forecasting. Inspired by the successful practice of pretrained large language models, this paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful spatiotemporal understanding for complex traffic scenes. Specifically, our approach involves three masking-reconstruction modeling tasks on scene inputs including agents' trajectories and road network, pretraining the scene encoder to capture kinematics within trajectory, spatial structure of road network, and interactions among roads and agents. The pretrained encoder is then finetuned on the downstream forecasting task. Extensive experiments demonstrate that SEPT, without elaborate architectural design or manual feature engineering, achieves state-of-the-art performance on the Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming previous methods on all main metrics by a large margin.

研究动机与目标

通过从交通环境中学习场景理解来推动高效、精准的运动预测。
开发一个自监督预训练方案，捕捉道路场景中的时间、空间和交互线索。
用三种掩码重建任务对场景编码器进行预训练，并对下游预测进行微调。

提出的方法

将代理体和道路网络表示为轨迹和道路向量输入。
使用三项任务预训练：标记轨迹建模（MTM）、掩码道路建模（MRM）、尾部预测（TP）。
MTM 对轨迹航点进行掩码并重建，以学习时间依赖。
MRM 掩码道路向量属性，以学习道路拓扑和连通性。
TP 根据头部轨迹和道路上下文预测尾部轨迹，以对齐时空表示。
使用 TempoNet（时间编码器）和 SpaNet（空间编码器），并用 Cross Attender 进行预测，形成统一的基于Transformer的管线。
使用下游轨迹预测解码器对预训练的编码器进行微调，优化联合回归和分类损失。

Figure 2: The overall architecture of SEPT

实验结果

研究问题

RQ1自监督预训练在时间、空间和交互线索上的是否能提升运动预测性能？
RQ2MTM、MRM 与 TP 任务对下游预测性能是否具有相加的贡献？
RQ3紧凑、单一架构的编码器是否足以在大规模运动预测基准上达到最先进的结果？

主要发现

预训练在主要运动预测指标上相对于从零开始训练带来一致的增益。
三项预训练任务对性能以加法方式贡献，其中尾部预测（TP）显著帮助时空表示的对齐。
SEPT 在 Argoverse 1 与 Argoverse 2 上达到最先进的结果，在主要指标上排名第一，参数大约只有最强基线的 40%，推理更快。
在 Argoverse 1，大多数指标中 SEPT 名列第一；在 Argoverse 2，在所报道的方法中所有指标均名列第一。
SEPT 展示更快的推理速度以及在紧凑的编码器架构（大约 9.6M 参数）下具有竞争力甚至更高的准确性。
消融研究表明 TP 对连接 TempoNet 与 SpaNet 至关重要，而 MTM 和 MRM 提供叠加的改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。