QUICK REVIEW

[論文レビュー] Lakh Pianoroll Dataset

Hao‐Wen Dong, Wen-Yi Hsiao|arXiv (Cornell University)|Sep 19, 2017

Music Technology and Sound Studies被引用数 151

ひとこと要約

MuseGAN は象徴音楽生成のためのマルチトラック逐次 GAN を提案し、3 つの生成モデル（jamming、composer、hybrid）、時系列構造の処理、そして Lakh MIDI Dataset から導出された Lakh Pianoroll Dataset（LPD）を導入し、トラック条件付き生成と専用の評価指標を提供します。

ABSTRACT

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .

研究の動機と目的

時系列とトラック間の一貫性を持つ、マルチトラックでポリフォニックな象徴音楽の生成を動機づける。
単一トラック/モノフォニーといった単純化仮定を避け、ピアノロール表現を用いて bass, drums, guitar, piano, strings の5トラックを生成する。
3 つの GAN ベースの生成モデルを開発し、ジャムセッション的なインター・トラック協調とイントラ・トラックダイナミクスを捉える。
人間と AI の協同による音楽創作のための、トラック条件付き生成のフレームワークを拡張する。
前処理を備えた新しいデータセット（LPD）を提供し、評価のためのイントラ・トラックおよびインター・トラックの目的指標を提案する。

提案手法

マルチトラック音楽をバー単位のピアノロール張量として表現し、CNN を活用する。
コア生成モデルとして Wasserstein GAN with gradient penalty（WGAN-GP）を採用する。
3 つの生成スキームを定義する：ジャミング（独立したプライベート生成器）、コンポーザー（共有生成器1つ）、ハイブリッド（プライベート生成器とトラック間入力を組み合わせ）。
時系列構造を取り入れる2つのアプローチ：Scratch からの生成で時間的生成器がバー生成器へ feed する、または与えられたトラックを潜在空間へ写すエンコーダを用いたトラック条件付き生成。
Inter-track と intra-track のランダムベクトルを各トラックのバー生成器と組み合わせて、バーごとにマルチトラックのピアノロールを生成することで MuseGAN を組み立てる。
エンコーダを用いてユーザー提供トラックからインター・トラック特徴を抽出する、トラック条件付き生成へ拡張。
データを Lakh MIDI ファイルから bass, drums, guitar, piano, strings の5トラックのピアノロールへ写像し、4小節のフレーズに区切って学習する前処理を行う。

Figure 1: Hierarchical structure of a music piece.

実験結果

リサーチクエスチョン

RQ1GAN ベースのモデルは、問題を簡略化することなく、コヒーレントなマルチトラック・ポリフォニック象徴音楽を生成できるか。
RQ2マルチトラック音楽生成におけるトラック間の協調とイントラトラックのダイナミクスをモデリングするのに有効なアーキテクチャは何か。
RQ3時間構造を効果的に学習して、生成音楽で音楽的に意味のあるフレーズ（4小節のセグメントなど）を作れるか。
RQ4トラック条件付き生成は、有用な人間-AI 協同の音楽作成や伴奏を可能にするか。

主な発見

3 つの GAN ベースの生成モデル（jamming、composer、hybrid）は、トラック間の和声関係を様々に生み出し、特に composer と hybrid は横断的な一貫性を強く示すことが多い。
jamming モデルはイントラトラック指標（断片化が少ない）で優れる傾向があり、composer / hybrid モデルはトラック間の和声をよりよく捉え（TD の音調距離が低い）。
時間的モデリングによりバーごとの生成が長いフレーズを形成でき、トラック条件付き生成は人手で作成されたトラックを伴奏できる。
目的指標（EB、UPC、QN、DP、TD）は、データの希少性、音高の使用、ノートの品質、ドラムパターン、トラック間の和声に関する洞察を提供し、モデル評価を導く。
Rock、4/4 拍子、フレーズ構造でフィルタリングされた Lakh MIDI Dataset 由来の五トラックを含む、50,266 の四小節フレーズのデータセット（LPD）が提供される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。