QUICK REVIEW

[論文レビュー] Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, Jitong Chen|arXiv (Cornell University)|Aug 24, 2017

Speech and Audio Processing参考文献 134被引用数 58

ひとこと要約

深層学習を用いた監督付き音声分離の総合的概要で、背景・構成要素・モノラルおよび複数マイク設置における分類されたアプローチを概説する。

ABSTRACT

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

研究の動機と目的

監督付き音声分離の背景と定式化を導入する。
本分野で用いられる学習機（learning machines）、訓練ターゲット、音響特徴量を要約する。
モノラル法（音声強調、話者分離、音声残響除去）とマイクロフォン複数設置の手法をレビューする。
監督学習に特有の一般化の課題を論じ、歴史的背景を提供する。

提案手法

監督付き分離の三つの主要構成要素：学習機（learning machines）、訓練ターゲット、および音響特徴量について論じる。
音声強調、話者分離、音声残響除去を含むモノラル分離アプローチをレビューする。
監督付き音声分離のためのマルチマイク手法をレビューする。
監督学習に内在する一般化の問題に対処し、歴史的視点を提供する。

実験結果

リサーチクエスチョン

RQ1深層学習ベースの監督付き音声分離で用いられる主要な学習システム・ターゲット・特徴量は何か？
RQ2モノラルとマルチマイクロフォンのアプローチはどのように比較・補完されるか？
RQ3監督付き音声分離における主要な一般化の課題は何で、これらはどのように対処されてきたか？
RQ4監督付き音声分離の深層学習の進展を形作った歴史的な傾向と発展は何か？

主な発見

深層学習は監督付き音声分離の進展を大幅に加速し、分離性能を向上させた。
この分野は学習機、訓練ターゲット、および音響特徴量を軸に整理でき、モノラルとマルチマイクのアプローチはそれぞれ独自の方法で区分される。
一般化は監督付き学習に特有の重要な課題であり、モデル設計と評価において慎重な配慮を要する。
本総説は、監督付き音声分離の進歩がどのように成され、どのように分類されてきたかという歴史的視点を提供する。
本論文は、ターゲットソースが何を構成するかを含む概念的課題を論じ、音声強調、話者分離、残響除去などの幅広い分離問題を扱う。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。