QUICK REVIEW

[论文解读] QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

Byeonggeun Kim, Seung-Han Yang|arXiv (Cornell University)|Jun 28, 2022

Music and Audio Processing被引用 27

一句话总结

本论文提出一种高效的 ASC 系统，使用 Residual Normalization、BC-ResNet-Mod 架构、 spectrogram-to-spectrogram 设备转换，以及通过剪枝、量化和知识蒸馏实现模型压缩，在跨设备泛化能力和参数量方面表现出色。

ABSTRACT

This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet-Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters. We extend this work to [1].

研究动机与目标

解决多设备数据中的设备不平衡和低模型复杂度在 ASC 中的问题。
开发适用于音频场景分类且感受野受限的高效 CNN 架构。
引入 Residual Normalization 以在保持判别信息的同时提升设备泛化能力。
通过 spectrogram-to-spectrogram 设备翻译来弥合域差，强化训练。
在不造成显著性能损失的情况下，利用剪枝、量化和知识蒸馏等方法压缩模型。

提出的方法

提出 BC-ResNet-Mod，一种带有限感受野并通过最大池化控制时间分辨率的改良广播残差网络。
引入 Residual Normalization (ResNorm)，一种带残差捷径的按频率的实例归一化，以保留有用的域信息。
开发基于包含子谱归一化的 U-Net 的设备翻译器，以在设备之间翻译声谱用于数据增强。
应用三种压缩技术——一次性幅值剪枝、量化感知训练（QAT）以及从教师网络进行知识蒸馏，以在保持准确性的同时减小模型尺寸。

实验结果

研究问题

RQ1ResNorm 如何提升在设备不平衡的 ASC 数据集对未见设备的泛化能力？
RQ2受限的感受野和最大池化对 BC-ResNet 变体在 ASC 的准确性与效率有何影响？
RQ3设备翻译和数据增强是否能降低多设备之间的域差？
RQ4剪枝、量化与知识蒸馏对低参数 ASC 模型的性能与压缩效果有何影响？

主要发现

Method	#Param	A	B	C	S1	S2	S3	S4	S5	S6	Overall	Std. Dev
BC-ResNet-Mod-1	8.1k	73.1	61.2	65.3	58.2	57.3	66.2	51.5	51.5	46.3	58.9	0.8
BC-ResNet-Mod-1 + Global FreqNorm	8.1k	73.9	60.9	65.5	60.2	57.9	67.9	50.2	54.3	49.4	60.0	0.9
BC-ResNet-Mod-1 + FreqIN	8.1k	69.9	63.5	60.0	65.3	66.7	67.6	65.9	64.9	62.0	65.1	0.6
BC-ResNet-Mod-1 + Pre-ResNorm	8.1k	75.1	68.9	67.0	66.0	63.9	69.3	63.4	66.9	63.6	67.1	0.8
BC-ResNet-Mod-1 + ResNorm	8.1k	76.4	65.1	68.3	66.0	62.2	69.7	63.0	63.0	58.3	65.8	0.7
CP-ResNet, c=64	899k	77.0	69.3	69.6	70.3	68.2	70.9	62.7	63.9	58.1	67.8	-
BC-ResNet-8, num SSN group=4	317k	77.9	70.4	72.4	69.5	68.3	69.8	66.3	64.1	58.6	68.6	0.4
BC-ResNet-Mod-8	315k	80.7	72.8	74.4	71.4	68.7	71.0	62.2	65.3	59.4	69.5	0.3
BC-ResNet-Mod-8 + Pre-ResNorm	315k	80.8	73.7	73.0	74.0	72.9	77.8	73.3	72.1	71.0	74.3	0.3
BC-ResNet-Mod-8 + ResNorm	315k	81.3	74.4	74.2	75.6	73.1	78.6	73.0	74.0	72.7	75.2	0.4
BC-ResNet-Mod-8 + ResNorm, Device Translator	315k	80.5	74.4	73.9	76.0	73.2	78.5	74.1	74.1	73.6	75.4	0.3
BC-ResNet-Mod-8 + ResNorm, 300epoch, KD	315k	82.6	75.6	74.7	77.0	74.2	78.7	75.1	74.8	73.4	76.3	0.8
+ model compress	-	82.0	73.8	74.3	76.2	73.2	78.8	73.8	72.8	73.3	75.3	0.8

带 ResNorm 的 BC-ResNet-Mod-8 在 TAU 2020 Mobile 开发数据集上实现了 75.2% 的平均测试准确率，参数量约为强基线的三分之一。
与 Global FreqNorm 和 FreqIN 等基线相比，ResNorm 同时提升了对已见设备的性能和对未见设备的泛化。
通过 spectrogram-to-spectrogram 设备翻译的设备翻译在训练中使用时，降低了跨设备的性能差距并提升了域泛化能力。
对教师网络进行知识蒸馏并结合 8-bit 量化，可以在官方数据集达到 76.3% 的平均准确率，同时实现 89% 的剪枝率和 8-bit 卷积权重，压缩到总大小 122 KB。
提出的 BC-ResNet-Mod-8 加 ResNorm 在开发集上的总体准确性显著高于基线 CP-ResNet 与 BC-ResNet-8 的变体。
最终压缩模型（KD + 剪枝 + 量化）在 TAU 2020 Mobile 开发环境下达到 75.3% 的总体准确率，大小为 121.9 KB。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。