QUICK REVIEW

[논문 리뷰] Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu|arXiv (Cornell University)|2024. 03. 27.

Reinforcement Learning in Robotics인용 수 7

한 줄 요약

Duolando는 VQ-VAE 토큰화와 오프폴리시 RL 미세조정 체계를 사용하여 음악 및 리더 움직임에 조건화된 동조된 팔로워 모션을 생성하는 GPT 기반의 듀엣 댄스 반주 팔로워 모델을 도입합니다. 또한 DD100 듀엣 댄스 모캡 데이터세트와 새로운 상호작용 벤치마크를 제공합니다.

ABSTRACT

We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.

연구 동기 및 목표

리드 댄서와 음악에 동기화된 팔로워 움직임을 생성하기 위한 댄스 반주의 새로운 과제 도입.
학습 및 평가를 위한 대규모 듀엣 댄스 모캡 데이터세트(DD100) 구축.
리더 모션, 음악, 팔로워 히스토리를 고려하는 GPT 기반 팔로워 모델(Duolando) 개발.
배경 음악과 리더 패턴의 분포 외 일반화 강화를 위한 오프폴리시 강화학습 적용.
모션 품질, 상호작용 및 리듬 정렬에 대한 벤치마크를 위한 지표 설정.

제안 방법

네 가지 모션 VQ-VAE(상체, 하체, 왼손, 오른손)와 상대적 변환 VQ-VAE를 사용하여 모션 및 상대적 translational 을 이산 토큰으로 양자화합니다.
음악, 리더 토큰, 이전 팔로워 토큰을 조건으로 하는 자동회귀적으로 팔로워 모션 토큰을 예측하고 미래의 조건화를 위한 LAT(look-ahead mechanism)를 갖춘 상호작용 조정 GPT를 학습합니다.
10개의 입력 모달리티(음악, 리더 z, 팔로워 z, tr)를 융합하기 위해 10x10 블록-형 하삼각(attention) 마스크를 가진 look-ahead 어텐션 메커니즘을 도입합니다.
학습된 Q 유사 가치에 시그모이드 매핑을 통해 기대 미래 보상으로의 토큰 확률을 맞추는 오프폴리시 강화학습을 도입합니다(sigma(Q(s,a))).
스케이팅 아티팩트를 줄이기 위해 단계별 보상을 정의하고, 동기화 오차를 계산하고 RL 보상을 안내하는 속도 기반 하체 디코딩 분기를 포함합니다.

Figure 1: Example of Duolando ’s results. The female avatar (red arrow) is driven by the proposed method to accompany real human’s (white) dancing.

실험 결과

연구 질문

RQ1GPT 기반 팔로워가 리드 댄서와 음악에 조건화된 안정적이고 비트에 맞춘 모션을 생성할 수 있는가?
RQ2오프폴리시 RL이 감독 학습만으로는 unseen 음악 및 리더 모션에 대한 일반화를 개선하는가?
RQ3상대적 변환 및 상호작용 조정의 명시적 모델링이 팔로워의 역학 및 리더와의 접촉에 어떤 영향을 미치는가?
RQ4look-ahead 조건화가 동조성과 모션 유창성에 어떤 영향을 미치는가?

주요 결과

Method	FID k (↓)	FID g (↓)	Div k (↑)	Div g (↑)	FID cd (↓)	Div cd (↑)	CF(%)	BED(↑)	BAS(↑)
Ground Truth	6.56	6.37	11.31	7.61	3.41	12.35	74.25	0.5308	0.1839
S Bailando (Siyao et al., 2022)	78.52	36.19	11.15	7.92	6643.31	52.50*	7.13	0.1831	0.1930
S EDGE (Tseng et al., 2023)	69.14	44.58	8.62	6.35	5894.45	60.62*	6.82	0.1822	0.1875
S Duolando w/o RL tr IC	12.53	24.17	10.51	9.42	4803.20	42.72*	7.04	0.1826	0.1852
D Duolando w/o RL tr	62.29	27.95	13.16	8.53	7970.19	54.53*	7.76	0.2194	0.2002
D Duolando w/o RL	106.72	34.10	13.88	7.03	21.68	9.33	57.43	0.2795	0.2193
D Duolando	25.30	33.52	10.92	7.97	9.97	14.02	52.36	0.2858	0.2046

RL 및 상호작용 조정을 포함한 Duolando는 솔로 베이스라인 및 제거 실험 대비 상호작용 및 리드믹 정렬이 향상되었습니다.
DD100 데이터세트는 훈련 및 벤치마킹에 사용되는 다양한 장르의 듀엣 모캡 데이터(10장르, 약 1.95시간)를 제공합니다.
정량적 지표는 Duolando 변형들이 상호작용 및 리듬 지표(Beat Echo Degree 및 BAS)에서 솔로 방법보다 우수하고, 운동 품질(FID 및 다양성)에서도 관절 및 그래픽 특성에서 경쟁력 있음을 보입니다.
변형 제거(상대적 변환 또는 RL 제거)가 성능을 저하시키는 반면, look-ahead 및 상호작용 조정 구성요소가 더 높은 품질의 동기화 팔로워 모션에 기여합니다.
명시적 보상을 기반으로 한 RL 미세조정은 분포 밖 조건에서의 스케이팅 아티팩트를 완화하는 데 도움이 됩니다.

Figure 2: Samples of DD100 dataset. The leader and the follower are colored in green and red , respectively. DD100 contains 10 dance genres, featuring a diverse range of poses and interactions, with intricate hand gestures.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.