QUICK REVIEW

[논문 리뷰] A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Shulei Ji, Jing Luo|arXiv (Cornell University)|2020. 11. 13.

Music and Audio Processing참고 문헌 327인용 수 79

한 줄 요약

깊은 음악 생성을 점수, 퍼포먼스, 오디오의 세 수준으로 조직한 조사로, 표현 형태, 데이터 세트, 평가 방법 및 향후 방향을 상세히 다룸.

ABSTRACT

The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. Previous surveys have explored the network models employed in the field of automatic music generation. However, the development history, the model evolution, as well as the pros and cons of same music generation task have not been clearly illustrated. This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning. In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.

연구 동기 및 목표

표현 수준(점수, 퍼포먼스, 오디오)별로 음악 생성 작업을 분류하여 문헌 검색을 표적화할 수 있도록 한다.
딥러닝 방식에서 사용되는 음악 표현, 데이터 세트, 평가 방법을 요약한다.
다양한 음악 생성 작업에 대한 딥러닝 모델의 강점과 한계를 분석한다.
깊은 음악 생성의 도전 과제를 강조하고 향후 연구 방향을 제시한다.

제안 방법

음악 생성 수준과 작업에 따라 기존 연구를 검토하고 분류한다.
Symbolic (MIDI-like) 및 오디오 표현의 차이점을 비교하고 REMI 및 tuple-based 인코딩을 포함한다.
score, performance, 및 audio 생성에 적용된 딥러닝 아키텍처를 조사한다 (RNNs, LSTMs, VAEs, GANs, Transformers).
데이터 세트, 평가 방법(객관적 및 주관적), 교차 모드/융합 가능성에 대해 논의한다.
현대 딥러닝 방법을 맥락화하기 위한 역사적 접근 방식과 데이터 세트 가용성을 요약한다.

실험 결과

연구 질문

RQ1딥러닝 연구에서 음악 생성 작업은 score, performance, 및 audio 표현에 따라 어떻게 organized 되어 있는가?
RQ2각 생성 수준에서 어떤 표현, 데이터 세트, 평가 방법이 지배적인가?
RQ3다양한 생성 작업에 대한 현재 모델(RNNs, VAEs, GANs, Transformers)의 주요 강점과 한계는 무엇인가?
RQ4깊은 음악 생성의 발전에 가장 영향력 있는 미래 방향과 도전 과제는 무엇인가?

주요 결과

딥러닝 아키텍처가 음악 생성에서 주류가 되어, 점수, 퍼포먼스, 오디오 작업에 걸쳐 활용되고 있다.
REMI 및 다른 표현 체계가 전통적인 MIDI-like 표현보다 리듬 모델링을 향상시킨다.
Transformers와 계층적 VAEs는 장기 구조 포착과 교차 모달 생성에 영향력이 있다.
WaveNet 및 GAN 기반 접근법이 오디오 합성 및 노래 음성 합성에서 상당한 발전을 이끌었다.
정교한 퍼포먼스 모델링에 필요한 aligned score-performance 데이터 세트의 부족은 병목 현상이다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.