[论文解读] Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features
本研究提出一种深度迁移学习方法,通过从预训练模型中提取的瓶颈特征,利用智能手机录制的咳嗽、呼吸和言语音频检测COVID-19。在Coswara数据集的咳嗽录音上,使用ResNet50模型实现了最高0.98的AUC,表明语音音频——尤其是咳嗽声——包含人类无法察觉但可通过智能手机上的机器学习方法识别的COVID-19特征信号。
We present an experimental investigation into the automatic detection of COVID-19 from coughs, breaths and speech as this type of screening is non-contact, does not require specialist medical expertise or laboratory facilities and can easily be deployed on inexpensive consumer hardware. Smartphone recordings of cough, breath and speech from subjects around the globe are used for classification by seven standard machine learning classifiers using leave-$p$-out cross-validation to provide a promising baseline performance. Then, a diverse dataset of 10.29 hours of cough, sneeze, speech and noise audio recordings are used to pre-train a CNN, LSTM and Resnet50 classifier and fine tuned the model to enhance the performance even further. We have also extracted the bottleneck features from these pre-trained models by removing the final-two layers and used them as an input to the LR, SVM, MLP and KNN classifiers to detect COVID-19 signature. The highest AUC of 0.98 was achieved using a transfer learning based Resnet50 architecture on coughs from Coswara dataset. The highest AUC of 0.94 and 0.92 was achieved from an SVM run on the bottleneck features extracted from the breaths from Coswara dataset and speech recordings from ComParE dataset. We conclude that among all vocal audio, coughs carry the strongest COVID-19 signature followed by breath and speech and using transfer learning improves the classifier performance with higher AUC and lower variance across the cross-validation folds. Although these signatures are not perceivable by human ear, machine learning based COVID-19 detection is possible from vocal audio recorded via smartphone.
研究动机与目标
- 开发一种非侵入性、低成本且可扩展的COVID-19检测方法,利用智能手机录制的语音音频。
- 探究咳嗽、呼吸和言语是否包含人类听觉无法察觉但可被检测的COVID-19生理特征信号。
- 通过在多样化音频数据上预训练并在目标数据集上微调,利用迁移学习提升分类性能。
- 评估深度神经网络的瓶颈特征作为传统分类器输入在COVID-19检测中的有效性。
- 比较不同语音模态——咳嗽、呼吸和言语——在识别COVID-19方面的诊断潜力。
提出的方法
- 在包含10.29小时咳嗽、喷嚏、言语和噪声的多样化数据集上,对预训练的CNN、LSTM和ResNet50模型进行迁移学习。
- 在目标数据集(Coughs and breaths使用Coswara,speech使用ComParE)上微调预训练模型,以适应COVID-19检测任务。
- 通过移除预训练模型的最后两层,提取瓶颈特征以捕捉高层表示。
- 将瓶颈特征作为输入,使用七种标准分类器(逻辑回归、SVM、MLP和KNN)进行分类。
- 应用leave-p-out交叉验证,以确保在所有折叠中性能估计的稳健性。
- 以AUC为主要指标评估模型性能,以评估诊断准确性和方差。
实验结果
研究问题
- RQ1深度迁移学习能否提升从咳嗽、呼吸和言语等语音音频中检测COVID-19的能力?
- RQ2在咳嗽、呼吸和言语中,哪种语音模态包含最强的可检测COVID-19特征信号?
- RQ3与端到端训练相比,预训练深度神经网络的瓶颈特征在性能和稳定性方面表现如何?
- RQ4迁移学习是否降低了在不同受试者群体中交叉验证性能的方差?
- RQ5是否可以仅使用智能手机录制的音频,在无需医学专业知识或实验室设施的情况下,高精度检测COVID-19?
主要发现
- 在Coswara数据集的咳嗽录音上,基于迁移学习的ResNet50模型实现了最高的0.98 AUC。
- 使用预训练模型的瓶颈特征,SVM分类器在Coswara数据集的呼吸录音上实现了0.94 AUC。
- 使用瓶颈特征的SVM分类器在ComParE数据集的言语录音上实现了0.92 AUC。
- 在所有语音模态中,咳嗽声携带最强的可检测COVID-19特征信号,其次是呼吸,最后是言语。
- 与基线模型相比,迁移学习显著提升了分类器性能,并降低了交叉验证折叠间的方差。
- 本研究证实,即使人类听觉无法察觉,机器学习仍能以高精度从语音音频中检测出COVID-19。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。