Skip to main content
QUICK REVIEW

[论文解读] Where Are We At with Automatic Speech Recognition for the Bambara Language?

Seydou Diallo, Yacouba Diarra|arXiv (Cornell University)|Feb 10, 2026
Speech Recognition and Synthesis被引用 0
一句话总结

本文提出了首个标准化的班巴拉语ASR基准,在实验室条件下评估了37个模型,结果显示即使是顶尖系统也落后于生产标准,WER约为47%、CER约为13%,凸显数据与架构方面对低资源语言的差距。

ABSTRACT

This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constitutional text. Designed as a controlled reference set under near-optimal acoustic and linguistic conditions, the benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models. Our findings reveal that current ASR performance remains significantly below deployment standards in a narrow formal domain; the top-performing system in terms of Word Error Rate (WER) achieved 46.76\% and the best Character Error Rate (CER) of 13.00\% was set by another model, while several prominent multilingual models exceeded 100\% WER. These results suggest that multilingual pre-training and model scaling alone are insufficient for underrepresented languages. Furthermore, because this dataset represents a best-case scenario of the most simplified and formal form of spoken Bambara, these figures are yet to be tested against practical, real-world settings. We provide the benchmark and an accompanying public leaderboard to facilitate transparent evaluation and future research in Bambara speech technology.

研究动机与目标

  • 为班巴拉语ASR提供标准化基准与排行榜,以实现透明评估。
  • 在受控声学条件下量化当前班巴拉语ASR在多种模型中的表现。
  • 分析影响性能的因素并确定改善低资源语言ASR的方向。
  • 强调数据收集、模型架构与班巴拉语评估实践的意义。

提出的方法

  • 组建一份时长1小时、在机录制、仅有一名男性发言者、近乎最优声学条件下的班巴拉语法律文本语料库。
  • 人工分段并将音频与转录对齐,以在QA后的基准中获得492个语音片段(speech segments)。
  • 在基准上评估37个公开可用的ASR模型(单语模型、支持班巴拉语的多语言模型,以及大型商业模型)。
  • 计算WER和CER,并推导出Combined分数为0.5*WER + 0.5*CER;提供可公开访问的排行榜,并可调整权重。
  • 提供定性错误分析与对度量权重的敏感性检查。
Figure 1: Models combined performance on Bambara Benchmark. Lower is better.
Figure 1: Models combined performance on Bambara Benchmark. Lower is better.

实验结果

研究问题

  • RQ1各种班巴拉语ASR模型在正式、受控的班巴拉语基准上的当前绩效如何?
  • RQ2多语言大模型是否能良好迁移到班巴拉语,还是语言特定模型表现更好?
  • RQ3在近似理想的条件下,班巴拉语ASR系统离生产就绪基准有多近?
  • RQ4影响班巴拉语ASR的主要错误模式和形态学挑战有哪些?
  • RQ5对数据收集和模型设计在代表性不足的非洲语言中的影响是什么?

主要发现

  • 最佳模型的WER为47.50%、CER为13.56%,Combined分数为29.73%。
  • 大多数多语言或现成模型表现不佳(例如OpenAI Whisper变体的WER超过100%)。
  • 以班巴拉语为主的单语模型(如Djelia和RobotsMali的变体)显著优于其基础版本以及许多多语言模型。
  • CER通常在各模型中优于WER,表明在班巴拉语形态中,音系捕捉比准确分词更容易。
  • 该基准在近似最优的声学条件和正式领域(马里宪法)下,因此在现实世界中的表现预计会因噪声、方言和代码切换而更差。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。