Skip to main content
QUICK REVIEW

[论文解读] Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, Jitong Chen|arXiv (Cornell University)|Aug 24, 2017
Speech and Audio Processing参考文献 134被引用 58
一句话总结

一个关于通过深度学习的有监督语音分离的全面综述,概述背景、组成部分,以及按单声道与多麦克风设置分类的方法。

ABSTRACT

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

研究动机与目标

  • 介绍有监督语音分离的背景与公式化。
  • 总结本领域使用的学习机器、训练目标和声学特征。
  • 回顾单声道方法(语音增强、说话人分离、降混响)和多麦克风技术。
  • 讨论有监督学习特有的泛化挑战并提供历史背景。

提出的方法

  • 讨论有监督分离的三个主要组成部分:学习机器、训练目标和声学特征。
  • 回顾单声道分离方法,包括语音增强、说话人分离和语音去混响。
  • 回顾有监督语音分离的多麦克风技术。
  • 解决有监督学习固有的泛化问题并提供历史视角。

实验结果

研究问题

  • RQ1基于深度学习的有监督语音分离中,主要的学习系统、目标和特征是什么?
  • RQ2单声道和多麦克风方法如何比较与互补?
  • RQ3有监督语音分离中的关键泛化挑战是什么,以及它们是如何被解决的?
  • RQ4哪些历史趋势和发展塑造了有监督深度学习在语音分离中的进展?

主要发现

  • 深度学习显著加速了进展并提升了有监督语音分离的分离性能。
  • 该领域可以围绕学习机器、训练目标和声学特征来组织,存在明确的单声道和多麦克风方法。
  • 泛化是有监督学习特有的关键问题,在模型设计与评估中需要仔细考量。
  • 概述提供了有监督语音分离的进展如何被实现和分类的历史视角。
  • 本文探讨概念性问题,包括何谓目标源,并涵盖诸如语音增强、说话人分离和降混响等一系列分离问题。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。