QUICK REVIEW

[论文解读] Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Ruozi Huang, Hu Huang|arXiv (Cornell University)|Jun 11, 2020

Human Pose and Action Recognition参考文献 62被引用 52

一句话总结

本文提出了一个用于长时音乐条件舞蹈生成的 seq2seq 架构，并引入课程学习以减少自回归误差积累，在性能上优于现有方法。

ABSTRACT

Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video to demonstrate the superior performance of our proposed approach at https://www.youtube.com/watch?v=lmE20MEheZ8.

研究动机与目标

阐明从音乐生成长时间舞蹈序列的挑战。
开发能够处理长音乐特征序列的 seq2seq 模型。
解决自回归舞蹈生成中的误差累积问题。
引入课程学习以将训练从教师强制（teacher-forcing）转向自回归生成。
证明在性能上优于最先进方法。

提出的方法

将音乐条件下的舞蹈生成形式化为序列到序列学习。
提出一种新颖的 seq2seq 架构，以高效处理长音乐特征并捕捉细粒度的音乐-舞蹈对应关系。
引入一种课程学习策略，逐步将训练从使用真实动作的教师强制转移到使用生成动作的自回归生成。
利用训练动态缓解长运动序列生成中的误差累积。
使用自动指标和人工评估进行评估，验证更优的性能。

实验结果

研究问题

RQ1如何高效处理长音乐特征序列以生成相应的舞蹈序列？
RQ2如何在细粒度层面捕捉音乐与舞蹈之间的对齐？
RQ3与完全有指导的训练相比，课程学习是否能缓解长动作生成中的自回归误差累积？

主要发现

所提出的方法在自动指标和人工评估上均优于现有的最先进方法。
课程学习策略缓解了长时间动作生成中的误差累积。
该架构能够有效建模长时音乐-舞蹈对应关系和风格一致性。
实验在客观指标和主观评估方面均验证了该方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。