Skip to main content
QUICK REVIEW

[论文解读] Tuberculosis Screening from Cough Audio: Baseline Models, Clinical Variables, and Uncertainty Quantification

George P. Kafentzis, Efstratios Selisios|arXiv (Cornell University)|Jan 12, 2026
Respiratory and Cough-Related Research被引用 0
一句话总结

本文通过咳嗽音频(可选临床数据)的标准化、可重复基线,建立基于咳嗽者不相关的嵌套交叉验证框架和保真预测来量化不确定性用于结核病检测。

ABSTRACT

In this paper, we propose a standardized framework for automatic tuberculosis (TB) detection from cough audio and routinely collected clinical data using machine learning. While TB screening from audio has attracted growing interest, progress is difficult to measure because existing studies vary substantially in datasets, cohort definitions, feature representations, model families, validation protocols, and reported metrics. Consequently, reported gains are often not directly comparable, and it remains unclear whether improvements stem from modeling advances or from differences in data and evaluation. We address this gap by establishing a strong, well-documented baseline for TB prediction using cough recordings and accompanying clinical metadata from a recently compiled dataset from several countries. Our pipeline is reproducible end-to-end, covering feature extraction, multimodal fusion, cougher-independent evaluation, and uncertainty quantification, and it reports a consistent suite of clinically relevant metrics to enable fair comparison. We further quantify performance for cough audio-only and fused (audio + clinical metadata) models, and release the full experimental protocol to facilitate benchmarking. This baseline is intended to serve as a common reference point and to reduce methodological variance that currently holds back progress in the field.

研究动机与目标

  • 建立一个用于咳嗽音频与临床元数据的标准化、可重复的结核病预测流程。
  • 确保咳嗽者无关的评估以实现公平的基准测试和泛化评估。
  • 在使用保真预测的同时,量化预测不确定性,与标准性能指标并行。
  • 提供一个基线框架和实验 protocol,以降低结核病音频筛查研究中的方法学变异性。

提出的方法

  • 使用一组手工特征(MFCCs、Chroma,以及简单的谱特征),并用统计量在时间上汇总。
  • 在有可用时将音频特征与临床元数据融合,并评估 Logistic Regression 和 CatBoost 模型。
  • 采用咳嗽者不相关的嵌套交叉验证策略(外部 10 折,内部 5 折)以防止跨咳嗽者的数据泄漏。
  • 应用带有校准集的保真预测,以产生含不确定性的预测和预测集合。
  • 用等距回归对分数进行校准,并在保留的校准子集上确定工作阈值(如 Youden)。
  • 对音频仅模型和融合集成模型均报告 ROC-AUC、PR-AUC、Sensitivity、Specificity、UAR、PPV、NPV。
Figure 1: Cougher-disjoint nested CV pipeline for model selection, calibration, and conformal prediction based uncertainty quantification.
Figure 1: Cougher-disjoint nested CV pipeline for model selection, calibration, and conformal prediction based uncertainty quantification.

实验结果

研究问题

  • RQ1一个标准化的咳嗽音频特征管线(有无临床数据)是否能在大规模、多国咳嗽数据集上准确预测 TB 状态?
  • RQ2强制咳嗽者不相关的评估是否比标准切分在 TB 咳嗽筛查中更能提升泛化?
  • RQ3将临床元数据加入对咳嗽音频的 TB 预测性能有何影响?
  • RQ4保真预测是否能提供有意义的不确定性量化和对 TB 筛查决策的弃权信号?

主要发现

  • 为 CODA TB 子集(来自 1,105 名个体的 9,772 个咳嗽样本)建立了用于训练两种常见模型的标准化流程。
  • 实现了咳嗽者不相关的嵌套交叉验证策略,以防信息泄漏并确保公平评估。
  • 仅音频特征以及音频+临床特征在 TB 预测中的表现被评估。
  • 使用保真预测量化预测不确定性,从而实现基于置信的决策输出,并在边界情况中潜在弃权。
  • 该方法包含校准步骤和阈值选择程序,以支持临床有意义的工作点。
Figure 2: MFCC and Chroma features for two cough waveforms, TB+ and TB-.
Figure 2: MFCC and Chroma features for two cough waveforms, TB+ and TB-.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。