QUICK REVIEW

[논문 리뷰] TrajLoom: Dense Future Trajectory Generation from Video

Zewei Zhang, Jia Jun Cheng Xian|arXiv (Cornell University)|2026. 03. 23.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

TrajLoom은 Grid-Anchor Offset Encoding, TrajLoom-VAE 잠재 공간, TrajLoom-Flow를 사용하여 관찰된 이력으로부터 밀집한 미래 포인트 궤적을 예측하며 경계 단서와 온-정책 미세조정을 이용해 장기 안정적 모션을 달성하고 TrajLoomBench에서 최첨단을 앞서다.

ABSTRACT

Predicting future motion is crucial in video understanding and controllable video generation. Dense point trajectories are a compact, expressive motion representation, but modeling their future evolution from observed video remains challenging. We propose a framework that predicts future trajectories and visibility from past trajectories and video context. Our method has three components: (1) Grid-Anchor Offset Encoding, which reduces location-dependent bias by representing each point as an offset from its pixel-center anchor; (2) TrajLoom-VAE, which learns a compact spatiotemporal latent space for dense trajectories with masked reconstruction and a spatiotemporal consistency regularizer; and (3) TrajLoom-Flow, which generates future trajectories in latent space via flow matching, with boundary cues and on-policy K-step fine-tuning for stable sampling. We also introduce TrajLoomBench, a unified benchmark spanning real and synthetic videos with a standardized setup aligned with video-generation benchmarks. Compared with state-of-the-art methods, our approach extends the prediction horizon from 24 to 81 frames while improving motion realism and stability across datasets. The predicted trajectories directly support downstream video generation and editing. Code, model checkpoints, and datasets are available at https://trajloom.github.io/.

연구 동기 및 목표

Dense한 궤적을 비디오의 미래 예측을 위한 컴팩트한 모션 표현으로 삼는 것을 동기부여한다.
장소 편향을 줄이기 위해 오프셋 기반 궤적 인코딩을 개발한다.
궤적의 컴팩트한 잠재 공간과 extended horizon 예측을 위한 안정적인 흐름 기반 생성기를 학습한다.
실제 및 합성 비디오를 포괄하는 통합 벤치마크(TrajLoomBench)를 도입하여 공정한 평가를 제공한다.
현실감, 안정성 및 모션 제어 기반 비디오 생성 및 편집에의 다운스트림 적용성을 개선하여 입증한다.

제안 방법

Absolute 좌표를 픽셀-센터 기준의 오프셋으로 변환해 위치 편향을 줄이는 Grid-Anchor Offset Encoding.
TrajLoom-VAE: 마스킹된 재구성과 시-공간 일관성 정규화를 갖춘 VAE로 밀집 궤적장(Dense trajectory fields)의 컴팩트한 잠재 표현을 학습한다.
TrajLoom-Flow: 관찰된 이력과 비디오 맥락에 조건화된 미래 잠재 궤적를 예측하는 정류된 흐름 생성기로, 경계 힌트와 온-폴리시 K-스텝 미세튜닝으로 장기 샘플링의 안정성을 확보한다.
경계 힌트와 토큰-정렬 융합은 흐름에 이력 잠재를 통합하여 모션의 일관된 연속성을 가능하게 한다.
온-폴리시 K-스텝 롤아웃은 ODE 기반 샘플링에서의 드리프트를 완화하기 위해 학습과 추론 경로를 정렬하는 데 도움이 된다.
실데 비디오(real)와 합성 데이터(real+synthetic)에서 WHN을 대상으로 한 TrajLoomBench 평가 비교

실험 결과

연구 질문

RQ1관찰된 모션 이력과 비디오 맥락으로부터 밀집한 미래 궤적을 어떻게 표현하고 예측할 수 있을까?
RQ2Grid-Anchor 오프셋 인코딩이 위치 불변성과 장기 예측 안정성을 개선할 수 있을까?
RQ3Rectified-flow 생성기와 VAE 기반 잠재 궤적 공간이 외관-조건 baselines보다 더 현실적이고 일관된 장기 미래를 제공하는가?
RQ4경계 힌트와 온-폴리시 미세튜닝이 장기 궤적 생성에 미치는 영향은 무엇인가?
RQ5실제+합성 벤치마크에서 밀집 궤적 예측에 대한 TrajLoom의 성능은 어떠한가?

주요 결과

TrajLoom은 모션 현실성과 안정성을 최첨단으로 끌어올려 예측 horizon을 24프레임에서 81프레임으로 확장했다.
Grid-Anchor Offset Encoding은 위치 의존성 분산을 크게 감소시키고 장기-호 평가 성능을 향상시킨다.
TrajLoom-VAE는 데이터셋 전반에서 궤적 재구성(VEPE)이 향상되며 24프레임에서 81프레임까지 안정적인 성능을 보여준다.
경계 힌트와 온-폴리시 미세튜닝을 갖춘 TrajLoom-Flow는 더 매끄럽고 일관된 모션을 나타내며 FlowTV 및 DivCurlE가 감소한다.
실제 및 합성 벤치마크에서 TrajLoom은 정량 지표(FVMD, FlowTV, DivCurlE)와 모션 일관성 측면에서 WHN을 능가한다.
예측된 궤적은 모션 제어 비디오 생성 및 편집(Wan-Move 통합)에 효과적으로 가이드를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.