QUICK REVIEW

[논문 리뷰] Tuberculosis Screening from Cough Audio: Baseline Models, Clinical Variables, and Uncertainty Quantification

George P. Kafentzis, Efstratios Selisios|arXiv (Cornell University)|2026. 01. 12.

Respiratory and Cough-Related Research인용 수 0

한 줄 요약

이 논문은 기침 음성(선택적 임상 데이터)을 사용한 결핵(TB) 탐지를 위한 표준화되고 재현 가능한 baseline을 cougher-disjoint nested cross-validation 프레임워크와 conformal prediction으로 수립한다.

ABSTRACT

In this paper, we propose a standardized framework for automatic tuberculosis (TB) detection from cough audio and routinely collected clinical data using machine learning. While TB screening from audio has attracted growing interest, progress is difficult to measure because existing studies vary substantially in datasets, cohort definitions, feature representations, model families, validation protocols, and reported metrics. Consequently, reported gains are often not directly comparable, and it remains unclear whether improvements stem from modeling advances or from differences in data and evaluation. We address this gap by establishing a strong, well-documented baseline for TB prediction using cough recordings and accompanying clinical metadata from a recently compiled dataset from several countries. Our pipeline is reproducible end-to-end, covering feature extraction, multimodal fusion, cougher-independent evaluation, and uncertainty quantification, and it reports a consistent suite of clinically relevant metrics to enable fair comparison. We further quantify performance for cough audio-only and fused (audio + clinical metadata) models, and release the full experimental protocol to facilitate benchmarking. This baseline is intended to serve as a common reference point and to reduce methodological variance that currently holds back progress in the field.

연구 동기 및 목표

기침 음성 및 임상 메타데이터에서 TB 예측을 위한 표준화되고 재현 가능한 파이프라인을 구축한다.
공정한 벤치마킹 및 일반화 평가를 가능하게 하도록 cougher-독립 평가를 보장한다.
conformal prediction을 사용하여 표준 성능 지표와 함께 예측 불확실성을 정량화한다.
TB-오디오 스크리닝 연구의 방법론적 변동성을 줄이기 위한 기본 프레임워크 및 실험 프로토콜을 제공한다.

제안 방법

일정 손으로 설계된 특징들(MFCCs, Chroma, 및 간단한 스펙트럴 특징)을 시간에 걸쳐 통계 함수들로 요약한다.
가능한 경우 음향 특징과 임상 메타데이터를 융합하고 로지스틱 회귀 및 CatBoost 모델로 평가한다.
cougher-disjoint nested cross-validation 전략을 채택한다(외부 10겹, 내부 5겹) 데이터 누수를 방지하고 공정한 평가를 보장한다.
calibration 세트를 사용한 conformal prediction을 적용하여 불확실성 인식 예측 및 예측 세트를 생성한다.
isotonic 회귀로 점수를 보정하고 held-out calibration 부분에서 운영 임계값(예: Youden)을 결정한다.
오디오 전용 및 융합 모델 모두에 대해 ROC-AUC, PR-AUC, 민감도, 특이도, UAR, PPV, 및 NPV를 보고한다.

Figure 1: Cougher-disjoint nested CV pipeline for model selection, calibration, and conformal prediction based uncertainty quantification.

실험 결과

연구 질문

RQ1표준화된 기침 음성 특징 파이프라인(임상 데이터 포함 여부와 무관)이 다국가의 대규모 기침 데이터 세트에서 TB 상태를 정확하게 예측할 수 있는가?
RQ2cougher-disjoint 평가를 강제하는 것이 TB 기침 선별의 일반화에 표준 분할보다 개선되는가?
RQ3임상 메타데이터를 추가하는 것이 기침 음성으로부터의 TB 예측 성능에 어떤 영향을 미치는가?
RQ4conformal prediction이 TB 선별 결정에 대해 의미 있는 불확실성 정량화와 거절 신호를 제공할 수 있는가?

주요 결과

solicited CODA TB 하위집합(1,105명으로부터 9,772기침 샘플)에 대해 두 가지 일반 모델을 학습하기 위한 표준화된 파이프라인이 설정되었다.
정보 누출을 방지하고 공정한 평가를 보장하기 위해 cougher-disjoint nested cross-validation 전략이 구현되었다.
음향 특징만의 경우와 음향+임상 특징 융합의 TB 예측 성능이 각각 검토되었다.
conformal prediction을 사용해 예측의 불확실성을 정량화하여 신뢰도 기반 의사결정 출력과 경계 상황에서의 잠재적 거절이 가능해졌다.
Calibrations 단계와 임계값 선택 절차를 포함하는 방법으로 임상적으로 의미 있는 작동 포인트를 지원한다.

Figure 2: MFCC and Chroma features for two cough waveforms, TB+ and TB-.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.