QUICK REVIEW

[논문 리뷰] Attention, please: A Spatio-temporal Transformer for 3D Human Motion Prediction

Emre Aksan, Peng Cao|arXiv (Cornell University)|2020. 04. 18.

Human Pose and Action Recognition인용 수 22

한 줄 요약

이 논문은 자세 생성을 조건부 시퀀스 합성 작업으로 모델링함으로써 장기 예측을 위한 공간-시간 트랜스포머 아키텍처를 제안한다. 장기적인 의존성을 포착하기 위해 공간과 시간을 분리한 자기주의 메커니즘을 사용하여 오차 누적이 크게 감소하고 1초가 넘는 시각적으로 타당한 장기 예측 운동 생성이 가능해진다.

ABSTRACT

In this paper, we propose a novel architecture for the task of 3D human motion modelling. We argue that the problem can be interpreted as a generative modelling task: A network learns the conditional synthesis of human poses where the model is conditioned on a seed sequence. Our focus lies on the generation of plausible future developments over longer time horizons, whereas previous work considered shorter time frames of up to 1 second. To mitigate the issue of convergence to a static pose, we propose a novel architecture that leverages the recently proposed self-attention concept. The task of 3D motion prediction is inherently spatio-temporal and thus the proposed model learns high dimensional joint embeddings followed by a decoupled temporal and spatial self-attention mechanism. The two attention blocks operate in parallel to aggregate the most informative components of the sequence to update the joint representation. This allows the model to access past information directly and to capture spatio-temporal dependencies explicitly. We show empirically that this reduces error accumulation over time and allows for the generation of perceptually plausible motion sequences over long time horizons as well as accurate short-term predictions. Accompanying video available at https://youtu.be/yF0cdt2yCNE .

연구 동기 및 목표

일반적인 단기 예측을 초월하여 자연스럽게 장기적인 3D 인간 운동 시퀀스를 생성하는 데 도전하는 것.
장기 예측에서 오차 누적과 정적 자세로 수렴하는 문제를 완화하는 것.
구조화된 주의 메커니즘을 사용하여 인간 운동의 복잡한 공간-시간 의존성을 모델링하는 것.
초기 시퀀스에서 조건부로 미래 자세를 생성할 수 있도록 생성 모델링 프레임워크를 제공하는 것.

제안 방법

모델은 초기 자세 시퀀스에 조건부로 설정된 조건부 생성 모델링 작업으로 3D 운동 예측을 프레임워크화한다.
이중 브랜치 주의 메커니즘을 적용한다: 관절 임베딩에 대해 병렬로 작동하는 별도의 공간 및 시간 자기주의 블록.
공간 자기주의는 각 시간 단계에서 신체 관절 간의 관계를 포착하고, 시간 자기주의는 시간 단계 간의 의존성을 모델링한다.
공통된 표현은 주의 기반 집합을 통해 업데이트되어 관련된 과거 정보에 직접 액세스할 수 있다.
모델은 시퀀스의 재구성 오차를 최소화함으로써 미래 자세를 예측하도록 훈련된다.
모델은 장기 예측 생성에 대해 평가되었으며, 정성적 결과는 보조 자료 영상에 제시되어 있다.

실험 결과

연구 질문

RQ1트랜스포머 기반 아키텍처가 오차 누적을 줄이며 장기 예측 3D 인간 운동 예측을 효과적으로 모델링할 수 있는가?
RQ2분리된 공간 및 시간 주의 메커니즘이 인간 운동의 공간-시간 의존성을 어떻게 향상시키는가?
RQ3모델이 1초가 넘는 범위에서 얼마나 잘 시각적으로 타당한 운동 시퀀스를 생성할 수 있는가?
RQ4조건부 생성 접근 방식이 장기 예측에서 이전 방법보다 뛰어난 성능을 보이는가?

주요 결과

제안된 모델은 이전 방법에 비해 장기적인 시간 간격에서 오차 누적을 크게 감소시켰다.
모델은 1초가 넘는 범위에서도 시각적으로 타당한 운동 시퀀스를 생성하여 장기적 일관성을 향상시켰다.
분리된 주의 메커니즘은 공간적 관절 관계와 시간적 동역학을 효과적으로 포착했다.
단기 예측에서는 높은 정확도를 유지하면서도 장기 예측에서 뛰어난 성능을 보였다.
함께 제공된 영상의 정성적 결과는 생성된 운동 시퀀스의 현실성과 다양성을 확인시켰다.
자기주의를 통해 과거 정보에 직접 액세스할 수 있어 시간적 모델링 정밀도가 향상되었다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.