QUICK REVIEW

[论文解读] Fast Threshold Optimization For Multi-Label Audio Tagging Using Surrogate Gradient Learning

Thomas Pellegrini, Timothée Masquelier|arXiv (Cornell University)|Mar 1, 2021

Music and Audio Processing参考文献 11被引用 1

一句话总结

本文提出 SGL-Thresh，一种用于多标签音频标记的快速且可扩展的决策阈值优化方法，通过代理梯度学习以最大化 F1 分数。通过使用 Sigmoid 梯度近似不可微的阈值处理，该方法在 AudioSet 评估中实现了 54.9% 的 micro-F1，优于默认阈值的 50.7%。

ABSTRACT

Multi-label audio tagging consists of assigning sets of tags to audio recordings. At inference time, thresholds are applied on the confidence scores outputted by a probabilistic classifier, in order to decide which classes are detected active. In this work, we consider having at disposal a trained classifier and we seek to automatically optimize the decision thresholds according to a performance metric of interest, in our case F-measure (micro-F1). We propose a new method, called SGL-Thresh for Surrogate Gradient Learning of Thresholds, that makes use of gradient descent. Since F1 is not differentiable, we propose to approximate the thresholding operation gradients with the gradients of a sigmoid function. We report experiments on three datasets, using state-of-the-art pre-trained deep neural networks. In all cases, SGL-Thresh outperformed three other approaches: a default threshold value (defThresh), an heuristic search algorithm and a method estimating F1 gradients numerically. It reached 54.9\% F1 on AudioSet eval, compared to 50.7% with defThresh. SGL-Thresh is very fast and scalable to a large number of tags. To facilitate reproducibility, data and source code in Pytorch are available online: https://github.com/topel/SGL-Thresh

研究动机与目标

自动优化多标签音频标记中的决策阈值，以提升 F1 分数。
解决在阈值优化过程中 F1 指标不可微的问题。
开发一种适用于大规模多标签音频标记的可扩展且高效的方法。
超越现有的阈值优化方法，包括启发式方法和数值梯度估计方法。
通过开源的 PyTorch 代码和数据集实现可复现性。

提出的方法

使用可微分的 Sigmoid 近似来估计不可微阈值操作的梯度。
通过反向传播经过 Sigmoid 近似的阈值函数，应用梯度下降法优化阈值。
将阈值优化视为可微推理流程中的可学习参数更新。
采用基于 F1 分数的代理损失，其梯度通过阈值步骤的 Sigmoid 近似计算。
与预训练的深度神经网络无缝集成，实现阈值的端到端微调。
由于采用可微分且基于梯度的优化方法，可高效扩展至大量标签。

实验结果

研究问题

RQ1尽管 F1 指标不可微，代理梯度学习是否能有效优化多标签音频标记中的决策阈值？
RQ2SGL-Thresh 在 F1 性能上与默认阈值、启发式搜索和数值梯度估计相比如何？
RQ3当应用于具有大量标签的音频标记任务时，该方法在可扩展性和效率方面表现如何？
RQ4基于 Sigmoid 的梯度近似是否能带来一致且可测量的 F1 分数提升，相较于标准阈值策略？
RQ5该方法是否能有效应用于包含 SOTA 预训练模型的真实世界数据集（如 AudioSet）？

主要发现

SGL-Thresh 在 AudioSet 评估集上实现了 54.9% 的 micro-F1，显著优于默认阈值方法（50.7%）。
在所有三个评估数据集中，该方法始终优于启发式搜索和数值梯度估计方法。
SGL-Thresh 展现出快速收敛和高可扩展性，适用于大规模多标签音频标记任务。
基于 Sigmoid 的代理梯度使通过阈值操作的有效反向传播成为可能，从而克服了 F1 指标不可微的问题。
该方法具有鲁棒性和泛化能力，在多种多标签音频标记基准上均表现出一致的性能提升。
开源的 PyTorch 实现确保了完全可复现性，并可轻松集成到现有音频标记流程中。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。