QUICK REVIEW

[論文レビュー] TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer

Jiawei Wang, Chuang Yang|arXiv (Cornell University)|Feb 24, 2026

Traffic Prediction and Management Techniques被引用数 0

ひとこと要約

TrajGPT-R は、都市軌跡生成のための Transformer を事前学習し、逆強化学習に基づく報酬モデルでファインチューニングする二段階フレームワークを用い、生成される都市モビリティ軌跡の信頼性と多様性を向上させる。

ABSTRACT

Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, yet access to such data is frequently hindered by privacy concerns. This research introduces a transformative framework for generating large-scale urban mobility trajectories, employing a novel application of a transformer-based model pre-trained and fine-tuned through a two-phase process. Initially, trajectory generation is conceptualized as an offline reinforcement learning (RL) problem, with a significant reduction in vocabulary space achieved during tokenization. The integration of Inverse Reinforcement Learning (IRL) allows for the capture of trajectory-wise reward signals, leveraging historical data to infer individual mobility preferences. Subsequently, the pre-trained model is fine-tuned using the constructed reward model, effectively addressing the challenges inherent in traditional RL-based autoregressive methods, such as long-term credit assignment and handling of sparse reward environments. Comprehensive evaluations on multiple datasets illustrate that our framework markedly surpasses existing models in terms of reliability and diversity. Our findings not only advance the field of urban mobility modeling but also provide a robust methodology for simulating urban data, with significant implications for traffic management and urban development planning. The implementation is publicly available at https://github.com/Wangjw6/TrajGPT_R.

研究の動機と目的

実際の都市ダイナミクスに似た大規模な都市モビリティ軌跡を生成することによりプライバシー制約を住所する。
オフラインで事前学習された Transformer ベースのモデルを用いて、軌跡生成を逐次決定問題として学習する。
軌跡ごとの嗜好を捕捉しファインチューニングを導く逆強化学習ベースの報酬モデルを導入する。
報酬モデリングに guided objective を組み込んだ事前学習済みモデルのファインチューニングを行い、生成軌跡の信頼性と多様性を向上させる。
複数の大規模都市モビリティデータセットで手法を検証し、ベースラインより優れた性能を示す。

提案手法

Transformer ベースの自回帰フレームワークを用いて都市軌跡生成を逐次決定問題としてモデル化する。
軌跡を状態・行動・リターン・トゥーゴー（return-to-go）トークンで表現し、オフライン RL とクロスエントロピー損失で訓練する。
一般的な嗜好と個別の嗜好を捉える Basic Value Estimator および Preference Value Estimator を用いた逆強化学習による軌跡毎の報酬モデルを構築する。
事前学習済みモデルを報酬モデルガイド付きの目的でファインチューニングし、ポリシー勾配信号（GAE）と監視学習損失を組み合わせ、報酬重みパラメータを小さく抑えて保守的な更新を行う。
Toyota、T-Drive、Porto Taxi のデータセットを信頼性と多様性の指標で評価し、性能改善を示す。

Figure 1 : Trajectory generation as a sequential decision-making problem . The vehicle navigates in the urban city by making decisions to determine the downstream link at each link

実験結果

リサーチクエスチョン

RQ1トランスフォーマーベースのモデルは、プライバシーを保護しつつ多様で信頼性の高い都市モビリティ軌跡を生成できるか。
RQ2IRL ベースの報酬モデルと RMFT を取り入れることで、オフライン RL のみと比べて軌跡生成の品質が向上するか。
RQ3TrajGPT-R は異なる都市文脈やデータセットに対してどの程度一般化できるか。

主な発見

Method	Jac(↑)	Cos(↑)	BLEU(↑)	L-JSD(↓)	C-JSD(↓)	UE(↑)	BE(↑)
TrajGPT-R	0.524	0.575	0.383	0.016	0.042	14.85	14.82
TrajGPT-R (Toyota)	0.635	0.570	0.345	0.005	0.013	8.57	10.22
TrajGPT-R (Porto)	0.522	0.470	0.432	0.013	0.032	10.13	10.75

TrajGPT-R は Toyota、T-Drive、Porto Taxi データセット全体でベースラインより信頼性と多様性が高い。
Toyota では Jac 0.524、Cos 0.575、BLEU 0.383、L-JSD 0.016、C-JSD 0.042、UE 14.85、BE 14.82 を達成。
T-Drive では Jac 0.635、Cos 0.570、BLEU 0.345、L-JSD 0.005、C-JSD 0.013、UE 8.57、BE 10.22 を達成。
Porto では Jac 0.522、Cos 0.470、BLEU 0.432、L-JSD 0.013、C-JSD 0.032、UE 10.13、BE 10.75 を達成。
RMFT を用いたファインチューニングと明示的な報酬モデリングは、長期的な軌跡生成と希薄な領域でのロバスト性を大きく向上させる。

Figure 2 : Our proposed Two-phase framework to enhance pretrained generative model for urban mobility trajectory generation with reinforcement learning (TrajGPT-R). Phase 1: A Generative pre-trained Transformer (GPT) is developed to acquire the general knowledge for generating urban mobility traject

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。