QUICK REVIEW

[논문 리뷰] DeepSRGM -- Sequence Classification and Ranking in Indian Classical Music with Deep Learning

Sathwik Tejaswi Madhusudhan, Girish Chowdhary|arXiv (Cornell University)|2024. 02. 15.

Music and Audio Processing인용 수 9

한 줄 요약

DeepSRGM은 주의(attention)와 함께 LSTM을 활용하여 라가 인식을 시퀀스 분류로 수행하고 라가 기반 콘텐츠 검색을 위한 시퀀스 랭킹을 도입하여 Comp Music Carnatic Dataset에서 최첨단 성능을 달성합니다.

ABSTRACT

A vital aspect of Indian Classical Music (ICM) is Raga, which serves as a melodic framework for compositions and improvisations alike. Raga Recognition is an important music information retrieval task in ICM as it can aid numerous downstream applications ranging from music recommendations to organizing huge music collections. In this work, we propose a deep learning based approach to Raga recognition. Our approach employs efficient pre possessing and learns temporal sequences in music data using Long Short Term Memory based Recurrent Neural Networks (LSTM-RNN). We train and test the network on smaller sequences sampled from the original audio while the final inference is performed on the audio as a whole. Our method achieves an accuracy of 88.1% and 97 % during inference on the Comp Music Carnatic dataset and its 10 Raga subset respectively making it the state-of-the-art for the Raga recognition task. Our approach also enables sequence ranking which aids us in retrieving melodic patterns from a given music data base that are closely related to the presented query sequence.

연구 동기 및 목표

인도 클래식 음악(ICM)의 자동 라가 인식을 다뤄 대규모 음악 컬렉션의 정리 및 추천을 지원한다.
LSTM-RNN과 주의(attention)을 사용하여 라가 인식을 시퀀스 분류 문제로 재정의한다.
쿼리 시퀀스와 유사한 시퀀스를 검색하기 위한 시퀀스 랭킹을 도입하여 콘텐츠 기반 검색을 수행한다.

제안 방법

보컬 소스 분리 및 피치 트랙킹으로 오디오 전처리.
센트 단위에서 토닉 기반 센터링으로 음정 정규화 톤널리(normalize).
768개의 히든 유닛과 128차원의 음표 임베딩을 갖는 LSTM-RNN을 학습한 뒤 주의(attention) 및 Dense 계층을 추가.
학습은 분산 비동기 SGD를 활용한 Adam 옵티마이저와 함께 범주형 교차 엔트로피 손실을 사용.

Figure 1 : Figure shows various preprocessing steps and model architecture for SRGM1 (refer Section 3)

실험 결과

연구 질문

RQ1LSTM과 주의(attention)을 사용한 피치 양자화 시퀀스에서 시퀀스 분류 문제로 라가 인식을 효과적으로 모델링할 수 있는가?
RQ2triplet loss를 사용한 미세 조정 모델이 라가 기반 검색을 위한 신뢰할 수한 시퀀스 랭킹을 가능하게 하는가?
RQ3부분 시퀀스 길이와 샘플링이 CMD에서 인식 및 랭킹 성능에 어떤 영향을 미치는가?
RQ4SRGM1 및 그 앙상블을 사용해 CMD-10 및 CMD-40에서의 최신 성능은 어느 수준인가?
RQ5대규모 ICM 데이터세트 내에서 콘텐츠 기반 검색을 위해 모델이 일반화할 수 있는가?

주요 결과

Method	CMD-10 Ragas	CMD-40 Ragas
SRGM1	95.6%	84.6%
SRGM1 Ensemble	97.1%	88.1%

SRGM1은 CMD-10 Ragascales에서 95.6%의 정확도와 CMD-40 Ragascales에서 84.6%의 정확도를 달성한다.
SRGM1 앙상블은 CMD-10에서 97.1%, CMD-40에서 88.1%로 향상된다.
SRGM2(시퀀스 랭킹)는 상위 30개 정밀도 81.83%와 상위 10개 정밀도 81.68%를 보인다.
모델은 CMD 및 CMD-10 부분집합에서 기존 TDMS, VSM, PCD 기반 방법보다 뛰어나다.
더 긴 부분시퀀스(예: 6000 단계)는 수렴 속도가 빠르고 더 안정적인 결과를 제공한다.
무작위 부분시퀀스와 주의(attention) 기반 LSTM으로 학습하면 라가 인식 성능이 향상된다.

Figure 2 : Schematic diagram for the sequence ranking algorithm. P, Q and R are the copies of the same model and hence have the same architecture.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.