QUICK REVIEW

[论文解读] Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu|arXiv (Cornell University)|Mar 27, 2024

Reinforcement Learning in Robotics被引用 7

一句话总结

Duolando 引入了一个基于 GPT 的跟随者模型用于双人舞伴奏，使用 VQ-VAE 令牌化和离策略 RL 微调来生成以音乐和领舞动作为条件的同步跟随者动作。它还提供 DD100 双人舞 mocap 数据集和一个新的交互基准。

ABSTRACT

We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.

研究动机与目标

引入一个新任务：舞蹈伴奏，生成与领舞者和音乐同步的跟随动作。
创建一个大规模的双人舞 mocap 数据集（DD100）用于训练和评估。
开发一个基于 GPT 的跟随者模型（Duolando），考虑领舞者动作、音乐和跟随者历史。
应用离策略强化学习以提升对分布外音乐和领舞模式的鲁棒性。
建立一个具有评估指标的基准，用于评估动作质量、互动和节拍对齐。

提出的方法

将四个运动 VQ-VAEs（上身、下身、左手、右手）以及一个相对平移 VQ-VAE 用于量化运动和相对平移为离散令牌。
训练一个交互协同的 GPT，使其自回归地在音乐、领舞者令牌和前序跟随者令牌的条件下预测跟随者运动令牌，并采用展望机制（LAT）进行未来条件预测。
引入看前注意力机制，采用一个 10-by-10 的分块下三角注意力掩码，将 10 种输入模态（音乐、leader z、follower z、tr）融合。
引入离策略强化学习，通过对数函数映射到期望未来回报（σ(Q(s,a))）使令牌概率与学习的 Q 式价值对齐。
定义逐步奖励以减少滑步伪影，包括一个基于速度的下身解码分支以计算同步误差并指导 RL 奖励。

Figure 1: Example of Duolando ’s results. The female avatar (red arrow) is driven by the proposed method to accompany real human’s (white) dancing.

实验结果

研究问题

RQ1GPT 基于的跟随者是否能够在给定领舞者和音乐的条件下生成稳定且 Beat 对齐的动作？
RQ2与仅监督学习相比，离策略 RL 是否能提高对未见音乐和领舞动作的泛化能力？
RQ3显式建模相对平移和互动协同如何影响跟随者的动态和与领舞者的接触？
RQ4看前条件对同步和动作流畅性的影响如何？

主要发现

Method	FID k (↓)	FID g (↓)	Div k (↑)	Div g (↑)	FID cd (↓)	Div cd (↑)	CF(%)	BED(↑)	BAS(↑)
Ground Truth	6.56	6.37	11.31	7.61	3.41	12.35	74.25	0.5308	0.1839
S Bailando (Siyao et al., 2022)	78.52	36.19	11.15	7.92	6643.31	52.50*	7.13	0.1831	0.1930
S EDGE (Tseng et al., 2023)	69.14	44.58	8.62	6.35	5894.45	60.62*	6.82	0.1822	0.1875
S Duolando w/o RL tr IC	12.53	24.17	10.51	9.42	4803.20	42.72*	7.04	0.1826	0.1852
D Duolando w/o RL tr	62.29	27.95	13.16	8.53	7970.19	54.53*	7.76	0.2194	0.2002
D Duolando w/o RL	106.72	34.10	13.88	7.03	21.68	9.33	57.43	0.2795	0.2193
D Duolando	25.30	33.52	10.92	7.97	9.97	14.02	52.36	0.2858	0.2046

在交互协同和强化学习的加持下，Duolando 相较于单独基线和消融组在互动性和节拍对齐方面表现更好。
DD100 数据集提供了多风格的多样化双人 mocap 数据（10 种风格，约 1.95 小时），用于训练和评估。
定量指标显示 Duolando 的变体在互动性和 Rhythm 指标（Beat Echo Degree 和 BAS）上优于单人方法，在运动质量（FID 和多样性）上对运动学与图形特征也具有竞争力。
消融结果表明去除相对平移或 RL 都会降低性能，而看前条件和互动协同组件有助于产生更高质量、同步的跟随者动作。
基于显式奖励的 RL 微调有助于在分布外条件下缓解滑步伪影。

Figure 2: Samples of DD100 dataset. The leader and the follower are colored in green and red , respectively. DD100 contains 10 dance genres, featuring a diverse range of poses and interactions, with intricate hand gestures.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。