QUICK REVIEW

[论文解读] Monaural Speech Enhancement with Recursive Learning in the Time Domain

Andong Li, Chengshi Zheng|arXiv (Cornell University)|Mar 22, 2020

Speech and Audio Processing参考文献 25被引用 1

一句话总结

本文提出RTNet，一种使用递归学习提升参数效率和性能的时域单通道语音增强网络。通过整合阶段递归网络、卷积自编码器和门控线性单元，RTNet在TIMIT语料库上的PESQ和STOI得分优于最先进的基线模型。

ABSTRACT

In this paper, we propose a type of neural network with recursive learning in the time domain called RTNet for monaural speech enhancement, where the proposed network consists of three principal components. The first part is called stage recurrent neural network, which is proposed to effectively aggregate the deep feature dependencies across different stages with a memory mechanism and also remove the interference stageby-stage. The second part is the convolutional auto-encoder. The third part consists of a series of concatenated gated linear units, which are capable of facilitating the information flow and gradually increasing the receptive fields. Recursive learning is adopted to significantly improve the parameter efficiency and therefore, the number of trainable parameters is effectively reduced without sacrificing its performance. The experiments are conducted on TIMIT corpus. Experimental results demonstrate that the proposed network achieves consistently better performance in both PESQ and STOI scores than two advanced time domain-based baselines in different conditions. The code is provided at https://github.com/Andong-Li-speech/RTNet.

研究动机与目标

为解决现有时域语音增强网络中参数量过高和特征依赖建模能力有限的挑战。
通过有效捕捉网络各阶段之间的深层时序依赖关系，提升语音增强性能。
通过递归学习机制降低模型复杂度，同时不牺牲性能。
在端到端语音增强架构中提升信息流动性和感受野扩展能力。

提出的方法

所提出的RTNet采用阶段递归神经网络，利用记忆机制在各阶段间聚合深层特征依赖关系，并逐步抑制干扰。
使用卷积自编码器在时域中学习输入语音信号的紧凑表示。
引入一系列串联的门控线性单元，以促进信息流动并逐步扩展感受野。
在网络中应用递归学习，以减少可训练参数数量，同时保持或提升性能。
在TIMIT语料库上使用时域损失目标端到端训练该模型。
该模型设计为直接处理原始波形，避免频域变换。

实验结果

研究问题

RQ1在时域神经网络中应用递归学习是否能在不降低语音增强性能的前提下减少参数量？
RQ2阶段递归网络在单通道语音增强中对建模长程时序依赖关系的有效性如何？
RQ3门控线性单元与自编码器的结合在多大程度上提升了特征表示和增强质量？
RQ4在多种噪声条件下，RTNet与先进时域基线模型在PESQ和STOI指标上的表现如何？

主要发现

在TIMIT语料库的多种噪声条件下，RTNet的PESQ得分始终高于两个先进的时域基线模型。
该模型表现出更优的STOI得分，表明语音可懂度和质量得到提升。
使用递归学习显著减少了可训练参数数量，同时保持高性能。
阶段递归网络能有效捕捉深层特征依赖关系，并逐级抑制干扰。
门控线性单元与自编码器的结合增强了信息流动性和感受野扩展。
RTNet的代码已公开发布于GitHub，便于复现和进一步研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。