QUICK REVIEW

[论文解读] RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field

N. Gößling, S. Doclo|arXiv (Cornell University)|Jul 11, 2018

Speech and Audio Processing被引用 3

一句话总结

该论文提出了一种计算高效的基于RTF的双耳MVDR波束成形器，利用外部麦克风估计扩散噪声场中的相对传递函数（RTFs）。通过假设头戴式麦克风与外部麦克风中噪声分量之间的空间相干性为零，该方法推导出一种无偏RTF估计器，显著提升了噪声抑制效果并保留了双耳线索——其性能接近于理想估计器，尤其在高混响和低信噪比（SNR）条件下表现更优。

ABSTRACT

Besides suppressing all undesired sound sources, an important objective of a binaural noise reduction algorithm for hearing devices is the preservation of the binaural cues, aiming at preserving the spatial perception of the acoustic scene. A well-known binaural noise reduction algorithm is the binaural minimum variance distortionless response beamformer, which can be steered using the relative transfer function (RTF) vector of the desired source, relating the acoustic transfer functions between the desired source and all microphones to a reference microphone. In this paper, we propose a computationally efficient method to estimate the RTF vector in a diffuse noise field, requiring an additional microphone that is spatially separated from the head-mounted microphones. Assuming that the spatial coherence between the noise components in the head-mounted microphone signals and the additional microphone signal is zero, we show that an unbiased estimate of the RTF vector can be obtained. Based on real-world recordings, experimental results for several reverberation times show that the proposed RTF estimator outperforms the widely used RTF estimator based on covariance whitening and a simple biased RTF estimator in terms of noise reduction and binaural cue preservation performance.

研究动机与目标

通过保留如双耳强度差（ILD）和双耳时间差（ITD）等空间感知线索，提升助听设备中的双耳噪声抑制效果。
解决在混响性强、扩散性噪声环境中传统方法性能下降的RTF估计挑战。
开发一种计算高效的RTF估计器，利用与头戴式麦克风空间分离度极低的外部麦克风。
在真实条件下，将性能与最先进的RTF估计器（包括协方差白化和有偏估计器）进行对比评估。

提出的方法

利用与头戴式麦克风空间分离的外部麦克风，以利用噪声分量的去相关性。
假设头戴式麦克风与外部麦克风信号中噪声的零空间相干性，以实现无偏RTF估计。
基于空间相干性推导出一种RTF估计器（SC），利用交叉谱矩阵和噪声协方差估计。
采用带遗忘因子的时间变协方差矩阵估计，以稳健跟踪噪声与语音能量。
利用估计的RTF向量和噪声协方差实现BMVDR波束成形器，以抑制干扰同时保留双耳线索。
使用真实录音数据，将所提出的SC估计器与有偏RTF估计器、协方差白化（CW）以及理想SCopt估计器进行对比。

实验结果

研究问题

RQ1当噪声分量之间的空间相干性可忽略不计时，外部麦克风是否能提升扩散噪声场中RTF估计的准确性？
RQ2所提出的基于空间相干性的RTF估计器是否在噪声抑制和双耳线索保留方面优于协方差白化和有偏估计器？
RQ3当使用干净语音作为外部信号时，所提出的RTF估计器与理想估计器相比性能如何？
RQ4混响时间与输入SNR在多大程度上影响所提出RTF估计器的性能？

主要发现

在所有混响时间和输入SNR条件下，所提出的基于空间相干性的RTF估计器（SC）在信噪比（SNR）提升方面始终优于协方差白化（CW）和有偏RTF估计器。
在混响时间为250 ms、500 ms和750 ms时，SC估计器在智能语音可懂度加权SNR提升方面优于CW和有偏估计器。
SC估计器显著降低了双耳线索误差（ILD和ITD），尤其在高混响条件下，且其性能接近理想SCopt估计器。
SC估计器在频率范围内产生了更一致的双耳线索，减少了CW和有偏估计器中常见的方向混淆或声音扩散感等失真。
SC估计器与理想SCopt估计器之间的性能差距极小，验证了实验场景中零空间相干性假设的合理性。
非正式听音测试表明，使用SC估计器时，目标声源被感知为点声源，且混响减少，从而增强了自然的空间感知。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。