QUICK REVIEW

[论文解读] Lakh Pianoroll Dataset

Hao‐Wen Dong, Wen-Yi Hsiao|arXiv (Cornell University)|Sep 19, 2017

Music Technology and Sound Studies被引用 151

一句话总结

MuseGAN 提出用于符号音乐生成的多轨序列 GAN，引入三种生成模型（jamming、composer、hybrid），处理时间结构，并引入由 Lakh MIDI Dataset 转换而来的 Lakh Pianoroll Dataset (LPD)，具备轨道条件生成和专用评估指标。

ABSTRACT

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .

研究动机与目标

激励在时间和轨道间连贯性下生成多轨、复调的符号音乐。
通过使用钢琴卷帘表示生成五个轨道（贝司、鼓组、吉他、钢琴、弦乐），以避免将假设简化为单轨/单声部。
开发三种基于 GAN 的生成模型，以捕捉 jamming、composing 和混合的轨道间动力学。
将框架扩展到轨道条件生成，以实现人机协同创作音乐。
提供带预处理的新数据集（LPD），并提出用于评估的轨道内和轨道间客观指标。

提出的方法

将多轨音乐表示为基于小节的 piano-roll 张量，以便利用 CNN。
采用带梯度惩罚的 Wasserstein GAN (WGAN-GP) 作为核心生成模型。
定义三种生成方案：jamming（独立私有生成器）、composer（单一共享生成器），以及 hybrid（带有轨道间输入的私有生成器）。
通过两种方式将时间结构引入：从头开始的生成，时间生成器为条生成器供给输入，以及使用编码器将给定轨道映射到潜在空间的轨道条件生成。
通过将轨道间和轨道内的随机向量与每个轨道的小节生成器结合，组装成逐小节的多轨 piano-roll。
扩展到轨道条件生成，使用编码器从用户提供的轨道中提取轨道间特征。
通过将 Lakh MIDI 文件映射为五轨 piano-roll（bass、drums、guitar、piano、strings）并切分为四小节短语进行训练来预处理数据。

实验结果

研究问题

RQ1在不将问题简化为更简单的子问题的前提下，基于 GAN 的模型如何生成连贯的多轨、复调的符号音乐？
RQ2在多轨音乐生成中，建模轨道间协调和轨道内动力学的有效架构是什么？
RQ3是否可以有效地学习时间结构，以在生成的音乐中产生具有音乐意义的短语（如四小节片段）？
RQ4轨道条件生成是否能够实现有用的人机协作音乐创作或伴奏？

主要发现

三种基于 GAN 的生成模型（jamming、composer、hybrid）产生不同的轨道间和声关系，其中 composer 和 hybrid 往往产生更强的跨轨道连贯性。
jamming 模型在轨道内指标上往往表现更出色（碎片化较少），而 composer/hybrid 模型更好地捕捉轨道间和声（TD 的音调距离较低）。
时序建模使逐小节生成成为可能，形成更长的短语，轨道条件生成支持搭配人类创作的轨道。
客观指标（EB、UPC、QN、DP、TD）提供关于数据稀疏性、音高使用、音符质量、鼓点模式和轨道间和声的洞察，指导模型评估。
提供一个包含 50,266 条四小节短语（LPD），包含五个轨道的数据集，该数据集来自 Lakh MIDI Dataset，筛选自 Rock、4/4 节拍和短语结构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。