QUICK REVIEW

[논문 리뷰] Deep Complex Networks

Chiheb Trabelsi, Olexa Bilaniuk|PolyPublie (École Polytechnique de Montréal)|2017. 05. 27.

Music and Audio Processing참고 문헌 29인용 수 166

한 줄 요약

이 논문은 복소수 심층 신경망을 위한 완전한 빌딩 블록 세트를 제시하며, 여기에는 복소수 합성곱, 복소수 배치 정규화, 복소수 활성화가 포함되고 CIFAR, MusicNet, TIMIT와 같은 비전 및 오디오 태스크에서 경쟁력 있는 성능을 시연합니다.

ABSTRACT

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.

연구 동기 및 목표

복소수 값의 심층 신경망과 그 빌딩 블록에 대한 일반 형식을 제공한다.
복소수 값을 가지는 연산을 합성곱 네트워크와 LSTM에 적용한다.
비전 및 오디오 데이터셋 전반에서 실제 작업에 대한 경쟁력 있는 성능을 시연한다.

제안 방법

복소수를 짝지어진 실수/허수 피처 맵으로 표현한다.
분리된 실수/허수 구성요소에 대해 실수 값 연산으로 복소수 합성곱을 도출한다.
2D 실수-허수 벡터의 화 whitening을 통한 복소수 배치 정규화를 도입한다.
크기 분포(Rayleigh)와 위상 난수화를 이용한 복소수 가중치 초기화를 제안한다.
C-ReLU, modReLU, z-ReLU를 포함한 활성화 함수를 사용하여 태스크 전반에서 평가한다.
CIFAR-10/100, SVHN*, MusicNet, 및 TIMIT에서 실제 값 기반 네트워크와의 비교를 수행한다.

실험 결과

연구 질문

RQ1복소수 값 네트워크가 표준 비전 벤치마크에서 실수 값 아키텍처와 대등하거나 더 우수한 성능을 낼 수 있는가?
RQ2복소수 블록(합성곱, BN, 활성화)이 초기화 및 학습 안정성을 고려할 때 경쟁력 있는 성능을 가능하게 하는가?
RQ3복소수 네트워크가 음악 전사 및 음성 스펙트럼 예측과 같은 오디오 관련 태스크에 특히 유리한가?

주요 결과

복소수 네트워크는 CIFAR-10, CIFAR-100, 및 SVHN*에서 실수 기반 모델과 경쟁력 있는 결과를 달성한다.
CIFAR-100에서 제시된 설정에서 복소수 표현이 실수 상대 모델을 능가한다.
2D 화이트닝 기반의 복소수 배치 정규화는 NaN을 피하고 실험 전반의 학습을 안정화한다.
이미지 인식 실험에서 C-ReLU가 modReLU 및 z-ReLU를 능가한다.
변화 분석은 성능과 안정성 측면에서 복소수 배치 정규화와 위상 보존 활성화의 중요성을 보여준다.
실험은 MusicNet 전사 및 TIMIT 스펙트럼 예측에서 보고된 범위 내 최첨단 성능을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.