QUICK REVIEW

[논문 리뷰] Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, Jitong Chen|arXiv (Cornell University)|2017. 08. 24.

Speech and Audio Processing참고 문헌 134인용 수 58

한 줄 요약

딥 러닝을 이용한 지도 학습 기반 음성 분리의 포괄적 개요로, 배경, 구성 요소 및 단일 마이크 및 다중 마이크 구성에서의 분류된 접근법을 정리한다.

ABSTRACT

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

연구 동기 및 목표

지도 음성 분리의 배경과 공식화 소개.
해당 분야에서 사용되는 학습 기계, 학습 목표, 음향 특징 요약.
단일 채널 방법(음성 향상, 화자 분리, 재반향 제거)와 다중 마이크 기술 검토.
지도 학습에 고유한 일반화 문제를 논의하고 역사적 맥락을 제시.

제안 방법

지도 분리의 세 가지 주요 구성 요소인 학습 기계, 학습 목표, 음향 특징에 대해 논의한다.
음성 향상, 화자 분리, 음성 재반향 제거를 포함한 단일 채널 분리 접근법 검토.
지도 음성 분리를 위한 다중 마이크 기술 검토.
지도 학습에 고유한 일반화 문제를 다루고 역사적 관점을 제시한다.

실험 결과

연구 질문

RQ1딥 러닝 기반 지도 음성 분리에서 사용되는 주된 학습 시스템, 목표, 특징은 무엇인가?
RQ2단일 채널과 다중 마이크 접근 방식은 어떻게 비교되고 서로를 보완하는가?
RQ3지도 음성 분리에서 주요 일반화 문제는 무엇이며, 어떻게 해결되어 왔는가?
RQ4음성 분리를 위한 지도 딥 러닝의 발전에 어떤 역사적 경향과 발전이 영향을 미쳤는가?

주요 결과

딥 러닝은 지도 음성 분리의 진행 속도를 크게 가속화하고 분리 성능을 향상시켰다.
이 분야는 학습 기계, 학습 목표, 음향 특징을 중심으로 구성될 수 있으며, 단일 채널 및 다중 마이크 접근법이 구분된다.
일반화는 지도 학습에 고유한 중요한 문제로, 모델 설계와 평가에서 신중한 고려가 필요하다.
이 개요는 지도 음성 분리의 발전이 어떻게 이루어지고 분류되어 왔는지에 대한 역사를 제공한다.
본 논문은 대상 소스가 무엇인지 포함한 개념적 문제를 논의하고, 음성 향상, 화자 분리, 재반향 제거 등 다양한 분리 문제를 다룬다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.