QUICK REVIEW

[논문 리뷰] Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

Muhammad Umar Salman, Mohammad Areeb Qazi|arXiv (Cornell University)|2026. 01. 25.

Speech Recognition and Synthesis인용 수 0

한 줄 요약

Quran-MD는 구절- 및 단어 수준의 아랍어 텍스트, 영어 번역, 음역, 그리고 30명의 낭독가의 정렬된 오디오를 연결하는 통합 다중 모달 쿠란 데이터세트로, NLP, ASR, Tajweed 연구 및 cross-modal 분석을 가능하게 합니다.

ABSTRACT

We present Quran MD, a comprehensive multimodal dataset of the Quran that integrates textual, linguistic, and audio dimensions at the verse and word levels. For each verse (ayah), the dataset provides its original Arabic text, English translation, and phonetic transliteration. To capture the rich oral tradition of Quranic recitation, we include verse-level audio from 32 distinct reciters, reflecting diverse recitation styles and dialectical nuances. At the word level, each token is paired with its corresponding Arabic script, English translation, transliteration, and an aligned audio recording, allowing fine-grained analysis of pronunciation, phonology, and semantic context. This dataset supports various applications, including natural language processing, speech recognition, text-to-speech synthesis, linguistic analysis, and digital Islamic studies. Bridging text and audio modalities across multiple reciters, this dataset provides a unique resource to advance computational approaches to Quranic recitation and study. Beyond enabling tasks such as ASR, tajweed detection, and Quranic TTS, it lays the foundation for multimodal embeddings, semantic retrieval, style transfer, and personalized tutoring systems that can support both research and community applications. The dataset is available at https://huggingface.co/datasets/Buraaq/quran-audio-text-dataset

연구 동기 및 목표

쿠란의 텍스트, 음역, 및 오디오 모달리티를 여러 낭독자에 걸쳐 연결합니다.
아랍어 텍스트, 번역, 음역 및 오디오의 구절- 및 단어 수준 정렬을 제공합니다.
표준화된 다중 모달 자원을 통해 NLP, ASR, TTS, 디지털 이슬람학 연구를 가능하게 합니다.
쿠란 구연의 언어학적 및 음성학적 분석과 Tajweed를 지원합니다.]
method_num_1? 사람?

제안 방법

세 가지 공개 소스의 데이터를 계층적 JSON 템플릿으로 조화시킵니다.
구절 수준의 오디오를 30명의 낭독자와 대응하는 구절 텍스트 및 번역에 연결합니다.
각 토큰에 구절 수준의 텍스트, 번역, 음역 및 정렬된 오디오를 첨부합니다.
모든 단어와 구절에 대응 오디오가 있도록 일관성을 검증합니다.
구절- 및 단어 수준 정보에 원활하게 접근할 수 있도록 데이터를 구성합니다.
표준화된 사용을 위한 Hugging Face에서 데이터셋을 출시합니다.]

Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

실험 결과

연구 질문

RQ1다중 모듈 및 다수 낭독자에 걸쳐 쿠란 데이터의 구절- 및 단어 수준이 어떻게 정렬될 수 있습니까?
RQ2다수 낭독자, 다중 모달 쿠란 데이터가 NLP, ASR, Tajweed 작업에 어떤 잠재적 이점을 제공합니까?
RQ3이 데이터셋이 쿠란 연구를 위한 다중 모달 임베딩, 검색 및 튜터링 도구의 개발을 어떻게 지원할 수 있습니까?

주요 결과

데이터세트에는 114 surahs, 6,236 ayahs, 및 ~77.8k 단어가 포함되어 있습니다.
구절 수준에서 30명의 낭독자에 의한 오디오 커버리지; 구절 수준 오디오 약 665 시간 및 단어 수준 오디오 약 22 시간.
모달리티에는 토큰에 구절- 및 단어 수준의 오디오가 정렬된 아랍어, 영어, 및 음역의 텍스트가 포함됩니다.
데이터 구조는 다운스트림 작업을 위한 교차 모달 정렬과 함께 구절- 및 단어 수준 분석을 지원합니다.
이 자원은 ASR, Tajweed 탐지, 쿠란 TTS, 스타일 전이, 및 다중 모달 의미 검색을 가능하게 합니다.]

Figure 1: Example of format of Surah 112 (Al-Ikhlas) in the Dataset.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.