QUICK REVIEW

[論文レビュー] A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Shulei Ji, Jing Luo|arXiv (Cornell University)|Nov 13, 2020

Music and Audio Processing参考文献 327被引用数 79

ひとこと要約

ディープミュージック生成をスコア、パフォーマンス、オーディオの3レベルに整理した survey。表現形式、データセット、評価手法、今後の方向性を詳述。

ABSTRACT

The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. Previous surveys have explored the network models employed in the field of automatic music generation. However, the development history, the model evolution, as well as the pros and cons of same music generation task have not been clearly illustrated. This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning. In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.

研究の動機と目的

音楽生成タスクを表現レベル（スコア、パフォーマンス、オーディオ）別に分類し、ターゲットを絞った文献検索を可能にする。
深層学習アプローチで使用される音楽表現、データセット、評価手法を要約する。
異なる音楽生成タスクに対する深層学習モデルの長所と限界を分析する。
深い音楽生成における課題を浮き彫りにし、今後の研究方向を提案する。

提案手法

音楽生成レベルとタスクに従って既存の研究をレビュー・分類する。
シンボリック（MIDI様）表現とオーディオ表現を比較し、REMIやタプルベースのエンコーディングを含む。
スコア、パフォーマンス、オーディオ生成に適用された深層学習アーキテクチャを概観（RNN、LSTM、VAE、GAN、Transformer）。
データセット、評価手法（客観的・主観的）、クロスモーダル/融合の可能性を論じる。
歴史的アプローチとデータセットの入手性を要約し、現代の深層学習手法を位置づける。

実験結果

リサーチクエスチョン

RQ1深層学習研究において、スコア・パフォーマンス・オーディオ表現を横断して音楽生成タスクはどのように整理されているか？
RQ2各生成レベルで支配的な表現、データセット、評価手法は何か？
RQ3異なる生成タスクに対する現在のモデル（RNN、VAE、GAN、Transformer）の主要な長所と限界は何か？
RQ4深層音楽生成を進展させる上で、最も影響力のある今後の展望と課題は何か？

主な発見

深層学習アーキテクチャは音楽生成の主流となり、スコア、パフォーマンス、オーディオのタスクにまたがっている。
REMIや他の表現スキームは従来のMIDI様表現よりリズムモデリングを改善する。
Transformersと階層的VAEは長期構造の捕捉とクロスモーダル生成に影響力がある。
WaveNetおよびGANベースのアプローチは音声合成と歌唱声合成を大幅に進歩させた。
スコアとパフォーマンスが整合したデータセットの不足は、表現豊かなパフォーマンスモデリングのボトルネックである。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。