QUICK REVIEW

[论文解读] Self-Supervised Feature Learning of 1D Convolutional Neural Networks with Contrastive Loss Using In-Ear Microphone Audio for Eating Detection

Vasileios Papapanagiotou, Christos Diou|arXiv (Cornell University)|Aug 2, 2021

Music and Audio Processing被引用 2

一句话总结

本文提出一种基于1D卷积神经网络与对比损失的自监督特征学习方法，利用耳内麦克风音频检测进食事件。通过利用可穿戴设备收集的未标注音频数据，并将计算机视觉中的SimCLR框架适配至音频领域，该方法实现了与监督学习及最先进方法相当的性能，显著降低了对昂贵的人工标注的依赖。

ABSTRACT

The importance of automated and objective monitoring of dietary behavior is becoming increasingly accepted. The advancements in sensor technology along with recent achievements in machine-learning--based signal-processing algorithms have enabled the development of dietary monitoring solutions that yield highly accurate results. A common bottleneck for developing and training machine learning algorithms is obtaining labeled data for training supervised algorithms, and in particular ground truth annotations. Manual ground truth annotation is laborious, cumbersome, can sometimes introduce errors, and is sometimes impossible in free-living data collection. As a result, there is a need to decrease the labeled data required for training. Additionally, unlabeled data, gathered in-the-wild from existing wearables (such as Bluetooth earbuds) can be used to train and fine-tune eating-detection models. In this work, we focus on training a feature extractor for audio signals captured by an in-ear microphone for the task of eating detection in a self-supervised way. We base our approach on the SimCLR method for image classification, proposed by Chen et al. from the domain of computer vision. Results are promising as our self-supervised method achieves similar results to supervised training alternatives, and its overall effectiveness is comparable to current state-of-the-art methods. Code is available at \url{https://github.com/mug-auth/ssl-chewing}.

研究动机与目标

减少进食检测模型对昂贵且易出错的人工标注的依赖。
利用来自耳内可穿戴设备（如蓝牙耳机）的未标注音频数据进行预训练。
将计算机视觉中的自监督对比学习（SimCLR）方法适配至音频信号，用于膳食监测。
仅使用耳内麦克风音频，开发一种稳健的进食检测特征提取器。
评估自监督训练是否可达到或接近监督学习在进食检测中的性能。

提出的方法

将SimCLR对比学习框架适配至耳内麦克风采集的1D音频信号。
采用时间裁剪和噪声注入等数据增强技术，生成对比学习的正样本视图。
使用1D卷积神经网络作为特征编码器，从音频中学习判别性表征。
应用对比损失，最大化同一音频样本不同增强视图之间的一致性，同时拉大不同样本视图之间的距离。
在少量标注数据上微调预训练模型，用于下游进食检测任务。
在迁移学习至分类任务前，端到端地以自监督方式训练模型。

实验结果

研究问题

RQ1在耳内麦克风音频上进行自监督对比学习，能否在进食检测中实现与监督学习相当的性能？
RQ2通过自监督预训练学习到的特征在下游进食检测任务中的迁移性能如何？
RQ3未标注的真实世界音频数据在多大程度上可减少膳食监测系统中的人工标注需求？
RQ4所提方法的性能与最先进进食检测模型相比如何？
RQ5在本场景下，哪些数据增强策略对基于音频的自监督学习最为有效？

主要发现

自监督模型在进食检测中的性能与监督训练基线相当，证明了弱监督的可行性。
该方法显著减少了对人工标注数据的依赖，解决了膳食监测中的关键瓶颈问题。
自监督特征提取器的迁移性能在该领域内与最先进方法具有竞争力。
该方法无需人工标注标签，即可有效从耳内麦克风信号中学习判别性音频表征。
结合数据增强的对比学习可生成鲁棒且泛化能力强的特征，适用于进食检测。
代码与模型已公开，支持可复现性，并推动自监督膳食监测领域的进一步研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。