[论文解读] Deep Complex Networks
本文为复值深度神经网络开发了一整套完整的基本构件,包括复卷积、复批量归一化和复激活,并在视觉和音频任务(如 CIFAR、MusicNet 和 TIMIT)上展示了有竞争力的性能。
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.
研究动机与目标
- 为复值深度神经网络及其基本组成提供一般性公式化描述。
- 将复值运算应用于卷积神经网络和长短时记忆网络(LSTMs)。
- 在跨视觉与音频数据集的真实世界任务中展示具有竞争力的性能。
提出的方法
- 通过成对的实部/虚部特征映射来表示复数。
- 将复卷积推导为对分离的实部/虚部分量的实值运算。
- 通过对 2D 实部-虚部向量进行白化来引入复批量归一化。
- 提出使用幅度分布(Rayleigh)和相位随机化的复权重初始化。
- 在各任务中使用包括 C-ReLU、modReLU 和 z-ReLU 的激活函数进行评估。
- 在 CIFAR-10/100、SVHN*、MusicNet 和 TIMIT 上将复网络与实值对比网络进行比较。
实验结果
研究问题
- RQ1复值网络是否能够在标准视觉基准上匹配或超越实值架构?
- RQ2复值块(卷积、BN、激活)是否在合理的初始化和训练稳定性下实现具有竞争力的性能?
- RQ3对于像音乐转录和语音频谱预测这样的与音频相关的任务,复值网络是否具有特别的优势?
主要发现
- 在 CIFAR-10、CIFAR-100 和 SVHN* 上,复值网络与实值模型取得了具有竞争力的结果。
- 在所报告的设置中,CIFAR-100 上复表示优于实值对手。
- 基于 2D 白化的复批量归一化避免了 NaN,并在多次实验中稳定了训练。
- 在所报道的图像识别实验中,C-ReLU 优于 modReLU 和 z-ReLU。
- 消融实验显示,复批量归一化和相位保持的激活对性能和稳定性的重要性。
- 实验表明在 MusicNet 转录和 TIMIT 谱预测方面达到其报道范围内的最先进性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。