QUICK REVIEW

[논문 리뷰] TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Driven Optimization

Xuepeng Jing, Wenhuan Lu|arXiv (Cornell University)|2026. 03. 26.

Anomaly Detection Techniques and Applications인용 수 0

한 줄 요약

TIGFlow-GRPO는 interaction-aware context(TIG-GAT)와 후훈련 Flow-GRPO 강화학습 단계를 갖춘 2단계 흐름 기반 궤적 예측기를 통합하여 생성된 미래를 사회적 규범 및 맵 제약에 맞추고 다중모드 정확도와 실행 가능성을 향상시킵니다.

ABSTRACT

Human trajectory forecasting is important for intelligent multimedia systems operating in visually complex environments, such as autonomous driving and crowd surveillance. Although Conditional Flow Matching (CFM) has shown strong ability in modeling trajectory distributions from spatio-temporal observations, existing approaches still focus primarily on supervised fitting, which may leave social norms and scene constraints insufficiently reflected in generated trajectories. To address this issue, we propose TIGFlow-GRPO, a two-stage generative framework that aligns flow-based trajectory generation with behavioral rules. In the first stage, we build a CFM-based predictor with a Trajectory-Interaction-Graph (TIG) module to model fine-grained visual-spatial interactions and strengthen context encoding. This stage captures both agent-agent and agent-scene relations more effectively, providing more informative conditional features for subsequent alignment. In the second stage, we perform Flow-GRPO post-training,where deterministic flow rollout is reformulated as stochastic ODE-to-SDE sampling to enable trajectory exploration, and a composite reward combines view-aware social compliance with map-aware physical feasibility. By evaluating trajectories explored through SDE rollout, GRPO progressively steers multimodal predictions toward behaviorally plausible futures. Experiments on the ETH/UCY and SDD datasets show that TIGFlow-GRPO improves forecasting accuracy and long-horizon stability while generating trajectories that are more socially compliant and physically feasible. These results suggest that the proposed framework provides an effective way to connect flow-based trajectory modeling with behavior-aware alignment in dynamic multimedia environments.

연구 동기 및 목표

혼잡한 현장에서 보행자 궤적 예측의 사회적 맥락 모델링을 개선한다.
비미분 가능 제약 하에서 흐름 기반 궤도 생성과 행동 지향 정렬을 연결한다.
사회적 및 맵 기반 실행 가능성을 강화하면서 다중 모달 미래를 탐색한다.
흐름 매칭으로 학습된 다중 모달 다양성을 보존하면서 사회적으로 준수한 예측을 촉진한다.

제안 방법

Conditional Flow Matching(CFM) 및 TIG-GAT 기반 맥락 인코딩으로 예측 미래 궤적을 얻는 2단계 프레임워크를 사용한다.
타깃 중심의 시야 인식 그래프 모듈로서 TIG-GAT를 도입하여 로컬 상호작용과 컨텍스트 토큰을 다듬어 흐름 백본을 조건화한다.
Flow-GRPO로 후훈련: ODE 롤아웃을 SDE로 재정의하여 확률적 궤적 탐색을 가능하게 하고 복합 보상으로 최적화한다.
Signed Distance Fields(SDF)와 장애물 페널티를 통한 시야 인지 사회 규칙과 맵 인지 실행 가능성을 결합한 복합 보상을 정의한다.
그룹 상대 정책 최적화(GRPO)로 프리-훈련 정책을 고정하고 생성된 궤적이 환경 제약에 맞도록 정렬하며 이전의 다중모달 다양성을 보존한다.
후훈련에서 ODE를 SDE로 전환하여 확률적 롤아웃과 연속 생성에 대한 GRPO 업데이트를 가능하게 한다.

실험 결과

연구 질문

RQ1복잡한 현장에서 흐름 기반 궤적 예측을 사회적 규범 및 환경 제약과 어떻게 정렬할 수 있는가?
RQ2지각 인식 상호작용 모듈과 보상 주도 후훈련 단계가 다중 모달 다양성을 희생하지 않으면서 사회적 준수와 물리적 실행 가능성을 향상시킬 수 있는가?
RQ3사회적 및 맵 기반의 비미분 가능 제약을 흐름 기반 궤적 생성에 효과적으로 주입하는 메커니즘은 무엇인가?
RQ4ODE-에서 SDE로의 확률적 롤아웃이 결정론적 롤아웃과 비교하여 탐색 및 정렬을 개선하는가?

주요 결과

ETH/UCY에서 TIGFlow-GRPO는 나열된 베이스라인 중 최상의 전반적 평균 ADE 및 FDE를 달성합니다(0.20, 0.31).
SDD에서 TIGFlow-GRPO의 ADE 7.37 및 FDE 11.67(픽셀 공간).
TIG-GAT와 Flow-GRPO를 함께 사용하면 사회적으로 밀집된 장면 및 맵 제약 환경에서 예측력이 향상됩니다.
MoFlow와 비교하여 TIGFlow-GRPO는 ETH/UCY 하위 집합에서 특히 상호 작용이 강한 장면(ZARA1, UNIV)에서 일관된 개선을 보입니다.
본 방법은 시야 인지 사회 보상과 맵 인지 의미론 보상을 통합하여 행동 정렬된 궤적 생성을 유도합니다.
실험 설정은 8개의 관찰 프레임으로 12개의 미래 프레임을 예측하며 ADE/FDE와 충돌률(Col)로 평가합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.