Skip to main content
QUICK REVIEW

[Paper Review] A Tutorial on Deep Learning for Music Information Retrieval

Keunwoo Choi, György Fazekas|arXiv (Cornell University)|Sep 13, 2017
Music and Audio Processing104 references73 citations
TL;DR

This tutorial surveys how deep learning is applied to Music Information Retrieval (MIR), outlines core neural network modules, data representations, and guidelines for applying DNNs to MIR tasks, and discusses challenges and advanced topics for new research.

ABSTRACT

Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research. However, the majority of works aim to adopt and assess methods that have been shown to be effective in other domains, while there is still a great need for more original research focusing on music primarily and utilising musical knowledge and insight. The goal of this paper is to boost the interest of beginners by providing a comprehensive tutorial and reducing the barriers to entry into deep learning for MIR. We lay out the basic principles and review prominent works in this hard to navigate the field. We then outline the network structures that have been successful in MIR problems and facilitate the selection of building blocks for the problems at hand. Finally, guidelines for new tasks and some advanced topics in deep learning are discussed to stimulate new research in this fascinating field.

Motivation & Objective

  • Introduce deep learning concepts in the MIR context and highlight why these methods are suitable for music tasks.
  • Review MIR problems and their attributes to help practitioners select appropriate deep learning approaches.
  • Describe core neural network modules (dense, convolutional, recurrent) and how they map to MIR tasks.
  • Discuss audio data representations and how to choose representations for MIR problems.
  • Provide guidelines and considerations for designing models and tackling advanced topics in MIR with deep learning.

Proposed method

  • Explain deep learning fundamentals and training considerations (loss functions, backpropagation, optimization, activation functions).
  • Survey how dense, convolutional, and recurrent layers are used in MIR and how pooling and kernel design impact performance.
  • Discuss data representations (STFT, mel-spectrogram, CQT, chromagram) and their suitability for different MIR tasks.
  • Correlate MIR problem types with network architectures and time-scale considerations (short vs long decision scales).
  • Outline practical strategies for data augmentation, transfer learning, and using random weights as feature extractors when data are limited.

Experimental results

Research questions

  • RQ1What are the key MIR tasks that benefit from deep learning, and how do problem characteristics influence model choice?
  • RQ2How do different audio representations and network architectures (dense, conv, recurrent) affect MIR performance?
  • RQ3What training and data-optimisation strategies are effective for MIR with limited data?
  • RQ4How can deep learning guidelines be applied to new MIR tasks to stimulate further research?

Key findings

  • Deep learning is becoming essential in MIR, with rapid growth in MIR papers and cross-domain applicability.
  • Convolutional neural networks effectively learn hierarchical, music-relevant features from time-frequency representations like mel-spectrograms and CQT.
  • Dense layers were foundational in early MIR work but are now often integrated with convnets or recurrent layers for improved performance.
  • Recurrent layers (e.g., LSTM/GRU) model temporal dependencies crucial for sequence-like MIR tasks.
  • Data representations and architectural choices should align with task characteristics such as whether the task is time-varying (short scale) or time-invariant (long scale).
  • Techniques like data augmentation, transfer learning, and using networks with random initialisation can help when data are scarce.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.