QUICK REVIEW

[论文解读] Where Are We At with Automatic Speech Recognition for the Bambara Language?

Seydou Diallo, Yacouba Diarra|arXiv (Cornell University)|Feb 10, 2026

Speech Recognition and Synthesis被引用 0

一句话总结

本文提出了首个标准化的班巴拉语ASR基准，在实验室条件下评估了37个模型，结果显示即使是顶尖系统也落后于生产标准，WER约为47%、CER约为13%，凸显数据与架构方面对低资源语言的差距。

ABSTRACT

This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constitutional text. Designed as a controlled reference set under near-optimal acoustic and linguistic conditions, the benchmark was used to evaluate 37 models, ranging from Bambara-trained systems to large-scale commercial models. Our findings reveal that current ASR performance remains significantly below deployment standards in a narrow formal domain; the top-performing system in terms of Word Error Rate (WER) achieved 46.76\% and the best Character Error Rate (CER) of 13.00\% was set by another model, while several prominent multilingual models exceeded 100\% WER. These results suggest that multilingual pre-training and model scaling alone are insufficient for underrepresented languages. Furthermore, because this dataset represents a best-case scenario of the most simplified and formal form of spoken Bambara, these figures are yet to be tested against practical, real-world settings. We provide the benchmark and an accompanying public leaderboard to facilitate transparent evaluation and future research in Bambara speech technology.

研究动机与目标

为班巴拉语ASR提供标准化基准与排行榜，以实现透明评估。
在受控声学条件下量化当前班巴拉语ASR在多种模型中的表现。
分析影响性能的因素并确定改善低资源语言ASR的方向。
强调数据收集、模型架构与班巴拉语评估实践的意义。

提出的方法

组建一份时长1小时、在机录制、仅有一名男性发言者、近乎最优声学条件下的班巴拉语法律文本语料库。
人工分段并将音频与转录对齐，以在QA后的基准中获得492个语音片段（speech segments）。
在基准上评估37个公开可用的ASR模型（单语模型、支持班巴拉语的多语言模型，以及大型商业模型）。
计算WER和CER，并推导出Combined分数为0.5*WER + 0.5*CER；提供可公开访问的排行榜，并可调整权重。
提供定性错误分析与对度量权重的敏感性检查。

Figure 1: Models combined performance on Bambara Benchmark. Lower is better.

实验结果

研究问题

RQ1各种班巴拉语ASR模型在正式、受控的班巴拉语基准上的当前绩效如何？
RQ2多语言大模型是否能良好迁移到班巴拉语，还是语言特定模型表现更好？
RQ3在近似理想的条件下，班巴拉语ASR系统离生产就绪基准有多近？
RQ4影响班巴拉语ASR的主要错误模式和形态学挑战有哪些？
RQ5对数据收集和模型设计在代表性不足的非洲语言中的影响是什么？

主要发现

最佳模型的WER为47.50%、CER为13.56%，Combined分数为29.73%。
大多数多语言或现成模型表现不佳（例如OpenAI Whisper变体的WER超过100%）。
以班巴拉语为主的单语模型（如Djelia和RobotsMali的变体）显著优于其基础版本以及许多多语言模型。
CER通常在各模型中优于WER，表明在班巴拉语形态中，音系捕捉比准确分词更容易。
该基准在近似最优的声学条件和正式领域（马里宪法）下，因此在现实世界中的表现预计会因噪声、方言和代码切换而更差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。