QUICK REVIEW

[论文解读] Learning from Between-class Examples for Deep Sound Recognition

Yuji Tokozume, Yoshitaka Ushiku|arXiv (Cornell University)|Nov 28, 2017

Music and Audio Processing参考文献 14被引用 153

一句话总结

BC学习将来自不同类别的两个声音混合，并训练模型预测混合比例，在各网络和数据集上提升准确率，在 ESC-50 上凭借 EnvNet-v2 超越人类水平。

ABSTRACT

Deep learning methods have achieved high performance in sound recognition tasks. Deciding how to feed the training data is important for further performance improvement. We propose a novel learning method for deep sound recognition: Between-Class learning (BC learning). Our strategy is to learn a discriminative feature space by recognizing the between-class sounds as between-class sounds. We generate between-class sounds by mixing two sounds belonging to different classes with a random ratio. We then input the mixed sound to the model and train the model to output the mixing ratio. The advantages of BC learning are not limited only to the increase in variation of the training data; BC learning leads to an enlargement of Fisher's criterion in the feature space and a regularization of the positional relationship among the feature distributions of the classes. The experimental results show that BC learning improves the performance on various sound recognition networks, datasets, and data augmentation schemes, in which BC learning proves to be always beneficial. Furthermore, we construct a new deep sound recognition network (EnvNet-v2) and train it with BC learning. As a result, we achieved a performance surpasses the human level.

研究动机与目标

促进深度声音识别的数据利用率提升。
通过混合来自不同类别的声音引入 Between-Class (BC) 学习。
训练模型以预测混合比以增大 Fisher 判别准则。
在多种架构和数据集上展示 BC 学习。
展示在更深网络上，BC 学习可在 ESC-50 上超越人类表现。

提出的方法

通过以随机比率混合来自不同类别的两个声音来创建训练样本。
使用考虑声压级的混合公式并计算相应的 p 以保持感知比（Eq. 2）。
将混合标签表示为 t = r t1 + (1 - r) t2，并使用 KL-散度损失进行优化。
使用小批量 SGD 进行训练；BC 学习可能需要比标准学习更多的训练轮次。
可视化特征空间以论证 Fisher 判别准则的扩大以及类关系的正则化。

实验结果

研究问题

RQ1BC 学习是否在不同架构、数据集和数据增强方案下提升识别性能？
RQ2两种声音应如何混合（标签应如何分配）以最大化 BC 的效果？
RQ3BC 学习对特征空间中 Fisher 判别准则及类关系正则化有何影响？
RQ4BC 学习是否能够在具有挑战性的环境声音数据集上超越人类表现？

主要发现

BC 学习在 ESC-50、ESC-10、UrbanSound8K 上，提升了所有评估网络（EnvNet、SoundNet5、M18、Logmel-CNN+BN、EnvNet-v2）。
在 ESC-50 上使用 EnvNet-v2 时，BC 学习实现 18.2% 的错误率（标准为 25.6%），在强增强下进一步提升至 15.1%。
BC 学习产生更大的 Fisher 判别准则并正则化类别分布，减少对混合类别声音的错误分类。
带有 BC 学习的 EnvNet-v2 在 ESC-50 上超过人类表现（18.2% 对比早期工作报道的人类 18.7%）。
消融实验显示所提混合方法（带 A 加权的 Eq. 2）和比值标签提供最佳性能（ESC-50 的错误率 24.1%）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。