QUICK REVIEW

[论文解读] Why are deep nets reversible: A simple theory, with implications for training

Sanjeev Arora, Yingyu Liang|arXiv (Cornell University)|Nov 18, 2015

Generative Adversarial Networks and Image Synthesis参考文献 15被引用 41

一句话总结

本文提出了一种简单的理论解释，说明为何深度 ReLU 网络具有可逆性：在类似随机的权重假设下，生成模型仅仅是使用转置权重的前向网络的反向过程。本文引入了 SHADOW 方法——利用网络隐藏层生成的合成数据来改进训练，展示了在合成数据上改进的泛化能力和误差追踪效果，并在 CIFAR-10、CIFAR-100 和 MNIST 上进行了验证，且在使用和不使用 dropout 的情况下均有效。

ABSTRACT

Generative models for deep learning are promising both to improve understanding of the model, and yield training methods requiring fewer labeled samples. Recent works use generative model approaches to produce the deep net's input given the value of a hidden layer several levels above. However, there is no accompanying "proof of correctness" for the generative model, showing that the feedforward deep net is the correct inference method for recovering the hidden layer given the input. Furthermore, these models are complicated. The current paper takes a more theoretical tack. It presents a very simple generative model for RELU deep nets, with the following characteristics: (i) The generative model is just the reverse of the feedforward net: if the forward transformation at a layer is $A$ then the reverse transformation is $A^T$. (This can be seen as an explanation of the old weight tying idea for denoising autoencoders.) (ii) Its correctness can be proven under a clean theoretical assumption: the edge weights in real-life deep nets behave like random numbers. Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer. The generative model suggests a simple modification for training: use the generative model to produce synthetic data with labels and include it in the training set. Experiments are shown to support this theory of random-like deep nets; and that it helps the training.

研究动机与目标

为生成建模中深层 ReLU 网络的可逆性提供理论依据。
解决深度学习中生成模型缺乏正式正确性证明的问题，特别是从输入恢复隐藏表征的问题。
提出一种简单且理论基础坚实的合成数据增强方法，即利用网络自身隐藏层生成的合成样本。
通过实证验证，合成数据通过反向过程生成后是否能提升训练性能和泛化能力。

提出的方法

提出类似随机的权重假设：现实世界中的深层网络其连接权重的总体统计特性与随机矩阵相似。
将生成模型定义为前向网络的反向过程，使用转置权重矩阵从隐藏表征 z 计算 p(x|z)。
引入 SHADOW 方法：对每个带标签的输入 x，通过前向传播计算其隐藏表征 z，然后通过反向传播生成合成输入 x̃，并将 (x̃, z) 添加到训练集中。
以不同方式应用该方法：使用不同隐藏层（h₂ 或 h₃）、在生成过程中加入 dropout 噪声，以及应用图像平滑以增强鲁棒性。
将阴影分布作为一种合理的方法，用于生成与网络内部表征一致的逼真合成数据。
在 CIFAR-10、CIFAR-100 和 MNIST 上实证测试该方法，比较使用和不使用 dropout 以及不同合成数据生成策略下的性能表现。

实验结果

研究问题

RQ1在何种条件下，前向网络可作为从输入恢复隐藏表征的正确推理方法？
RQ2能否基于转置权重的简单可逆变换，正式证明深层 ReLU 网络的生成模型？
RQ3通过反向网络过程生成合成数据是否能在实践中提升泛化能力和训练性能？
RQ4在实际中，从不同隐藏层（h₂ 与 h₃）生成的合成数据性能如何比较？
RQ5在生成合成数据时，是否可通过额外正则化技术（如采样、平滑）进一步提升模型鲁棒性？

主要发现

SHADOW 方法显著加快了训练过程中的误差下降速度，并在 CIFAR-10、CIFAR-100 和 MNIST 上保持相对于标准反向传播加 dropout 的性能优势。
合成数据上的测试误差与真实数据上的测试误差高度一致，验证了理论预测：阴影分布是真实数据分布的有效代理。
使用 h₃ 而非 h₂ 生成合成数据可获得相当或更优的性能，表明高层表征可有效生成逼真的输入。
在生成过程中加入采样（如 dropout）会增加方差，但最终误差保持相似，支持该方法的鲁棒性。
对合成数据应用图像平滑可降低最终测试误差，表明引入归纳偏置（如平滑性）可增强泛化能力。
实证验证确认，现实世界中的深层网络（如 AlexNet）表现出类似随机的特性——权重元素近似独立同分布，奇异值遵循四分之一圆律——支持了理论假设。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。