QUICK REVIEW

[論文レビュー] Deep Complex Networks

Chiheb Trabelsi, Olexa Bilaniuk|PolyPublie (École Polytechnique de Montréal)|May 27, 2017

Music and Audio Processing参考文献 29被引用数 166

ひとこと要約

本論文は、複素値深層ニューラルネットワークの包括的なビルディングブロックを確立し、複素畳み込み、複素バッチ正規化、複素活性化などを含む、CIFAR、MusicNet、TIMIT などの視覚・音声タスクで競争力のある性能を示します。

ABSTRACT

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.

研究の動機と目的

複素値深層ニューラルネットワークとそのビルディングブロックの一般的な定式化を提供する。
畳み込みネットワークとLSTMへの複素値演算の適用。
視覚および音声データセットにわたる実世界タスクで競争力のある性能を示す。

提案手法

複素数をペアの実部/虚部特徴マップで表現する。
分割された実部/虚部成分上で実値演算として複素畳み込みを導出する。
2D実部-虚部ベクトルのホワイトニングによる複素バッチ正規化を導入する。
大きさ分布（Rayleigh）と位相のランダム化を用いた複素重み初期化を提案する。
C-ReLU、modReLU、z-ReLU を含む活性化関数を用いて、タスク全体で評価する。
CIFAR-10/100、SVHN*、MusicNet、TIMIT で実数値対向モデルと比較する。

実験結果

リサーチクエスチョン

RQ1複素値ネットワークは標準的な視覚ベンチマークで実数値アーキテクチャと同等または優れることができるか？
RQ2複素ブロック（畳み込み、BN、活性化）は、妥当な初期化と訓練安定性を伴い競争力のある性能を可能にするか？
RQ3複素ネットワークは音声関連タスク（音楽の転写や音声スペクトル予測など）に特に有利か？

主な発見

複素ネットワークは CIFAR-10、CIFAR-100、SVHN* で実数モデルと競合する結果を達成。
CIFAR-100 では、報告された設定で複素表現が実数の相手を上回る。
2D whiteningベースの複素バッチ正規化は NaN を避け、実験全体で訓練を安定化させる。
C-ReLU は報告された画像認識実験で modReLU および z-ReLU を上回る。
アブレーションにより、性能と安定性のためには複素バッチ正規化と位相保持型活性化の重要性が示された。
報告範囲内で MusicNet の転写と TIMIT のスペクトル予測において最先端の性能を示唆する実験。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。