QUICK REVIEW

[論文レビュー] DeepSRGM -- Sequence Classification and Ranking in Indian Classical Music with Deep Learning

Sathwik Tejaswi Madhusudhan, Girish Chowdhary|arXiv (Cornell University)|Feb 15, 2024

Music and Audio Processing被引用数 9

ひとこと要約

DeepSRGM は注意機構付き LSTM を用いてシーケンス分類としてラガ認識を行い、ラガベースのコンテンツ検索のためのシーケンスランキングを導入し、Comp Music Carnatic Datasetで最先端の結果を達成。

ABSTRACT

A vital aspect of Indian Classical Music (ICM) is Raga, which serves as a melodic framework for compositions and improvisations alike. Raga Recognition is an important music information retrieval task in ICM as it can aid numerous downstream applications ranging from music recommendations to organizing huge music collections. In this work, we propose a deep learning based approach to Raga recognition. Our approach employs efficient pre possessing and learns temporal sequences in music data using Long Short Term Memory based Recurrent Neural Networks (LSTM-RNN). We train and test the network on smaller sequences sampled from the original audio while the final inference is performed on the audio as a whole. Our method achieves an accuracy of 88.1% and 97 % during inference on the Comp Music Carnatic dataset and its 10 Raga subset respectively making it the state-of-the-art for the Raga recognition task. Our approach also enables sequence ranking which aids us in retrieving melodic patterns from a given music data base that are closely related to the presented query sequence.

研究の動機と目的

大規模な音楽コレクションの整理と推奨を支援するために、インド古典音楽(ICM)における自動ラガ認識に取り組む。
LSTM-RNNと注意機構を用いてラガ認識をシーケンス分類問題として再定式化する。
クエリシーケンスに密接に関連するシーケンスを取得するためのシーケンスランキングを導入し、コンテンツベース検索を可能にする。

提案手法

ボーカル源分離とピッチ追跡を用いて音声を前処理する。
セント単位のトニック中心化による音階正規化。
768個の隠れユニットと128次元のノート埋め込みを用いてLSTM-RNNを訓練し、その後注意機構と全結合層を適用する。
カテゴリカルクロスエントロピー損失と分散非同期SGDを用いたAdamオプティマで訓練する。

Figure 1 : Figure shows various preprocessing steps and model architecture for SRGM1 (refer Section 3)

実験結果

リサーチクエスチョン

RQ1ピッチ量子化シーケンス上で注意機構付きLSTMを用いて、ラガ認識を効果的にシーケンス分類問題としてモデル化できるか。
RQ2トリプレット損失を用いて微調整したモデルは、ラガベースの検索のための信頼できるシーケンスランキングを実現できるか。
RQ3CMD上での認識とランキング性能に対する部分列の長さとサンプリングはどのような影響を与えるか。
RQ4SRGM1とそのアンサンブルを用いたCMD-10およびCMD-40での最先端性能はどの程度か。
RQ5大規模なICMデータセット内でコンテンツベース検索へモデルは一般化できるか。

主な発見

方法	CMD-10 ラーガ	CMD-40 ラーガ
SRGM1	95.6%	84.6%
SRGM1 Ensemble	97.1%	88.1%

SRGM1 は CMD-10 Ragascales で 95.6% の精度を、 CMD-40 Ragascales で 84.6% の精度を達成。
SRGM1 Ensemble は CMD-10 で 97.1%、CMD-40 で 88.1% に改善。
SRGM2（シーケンスランキング）は top-30 precision 81.83%、top-10 precision 81.68% を達成。
このモデルは CMD および CMD-10 のサブセットで従来の TDMS、VSM、PCD ベースの手法を上回る。
長い部分列（例: 6000 ステップ）は、短い部分列よりも収束が速く安定性が高い。
ランダム化された部分列と注意機構付きLSTMを用いた訓練はラガ認識性能を向上させる。

Figure 2 : Schematic diagram for the sequence ranking algorithm. P, Q and R are the copies of the same model and hence have the same architecture.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。