Skip to main content
QUICK REVIEW

[论文解读] CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces

Liheng Zhang, Marzieh Edraki|arXiv (Cornell University)|May 19, 2018
Advanced Neural Network Applications参考文献 13被引用 38
一句话总结

CapProNet 提出了一种新颖的深度学习框架,通过将输入特征投影到学习得到的正交胶囊子空间上来增强特征表示,利用胶囊长度进行分类。该方法在 CIFAR 和 SVHN 基准测试中达到最先进性能,相较于 ResNet 将测试误差降低 10–20%,相较于 DenseNet 降低 5–7%,且计算开销极低。

ABSTRACT

In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. We will also show that the capsule projection can be viewed as normalizing the multiple columns of the weight matrix simultaneously to form an orthogonal basis, which makes it more effective in incorporating novel components of input features to update capsule representations. In other words, the capsule projection can be viewed as a multi-dimensional weight normalization in capsule subspaces, where the conventional weight normalization is simply a special case of the capsule projection onto 1D lines. Only a small negligible computing overhead is incurred to train the network in low-dimensional capsule subspaces or through an alternative hyper-power iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of the state-of-the-art ResNet backbones by $10-20\%$ and that of the Densenet by $5-7\%$ respectively at the same level of computing and memory expenses. The CapProNet establishes the competitive state-of-the-art performance for the family of capsule nets by significantly reducing test errors on the benchmark datasets.

研究动机与目标

  • 通过使用正交投影到胶囊子空间而非神经元激活来进行分类,正式化并改进胶囊网络架构。
  • 解决尽管架构创新显著,现有胶囊网络性能提升有限的问题。
  • 证明胶囊投影(而非简单的神经元分组)是实现显著性能提升的关键。
  • 表明胶囊投影机制可实现高效、端到端训练,且计算与内存开销极低。

提出的方法

  • 该模型学习一组正交投影矩阵,每个类别对应一个,将输入特征向量投影到特定类别的胶囊子空间上。
  • 输入特征被正交分解为胶囊分量(投影到子空间上)和补充分量(垂直于子空间)。
  • 胶囊长度由投影结果得出,用作类别存在的得分,而方向则编码姿态、尺度等实例化参数。
  • 通过利用补充分量的梯度,反向传播更新投影矩阵,实现胶囊子空间的迭代优化。
  • 当子空间为一维时,该方法可推广为权重重归一化,扩展至多维正交基学习。
  • 采用高效的超幂迭代方法估算归一化矩阵,最小化计算成本。

实验结果

研究问题

  • RQ1与标准胶囊层或全连接层相比,将输入特征正交投影到学习得到的胶囊子空间上,是否能显著提升深度神经网络的分类准确率?
  • RQ2胶囊投影机制是否比简单的神经元分组提供更优的外观变化不变性?
  • RQ3在准确率和计算效率方面,该方法与 ResNet 和 DenseNet 等最先进主干网络相比表现如何?
  • RQ4胶囊子空间的正交结构在多大程度上促进了更快收敛和更好泛化?
  • RQ5胶囊投影能否在不引入显著开销的前提下,无缝集成到现有网络架构中?

主要发现

  • 当使用相同主干网络时,CapProNet 将 CIFAR10 的测试误差从 10.3% 降低至 3.64%,将 SVHN 的测试误差从 4.3% 降低至 1.54%。
  • 该模型在 ResNet-110 上将准确率提升 10–20%,在 DenseNet 上提升 5–6%,额外训练时间不足 1%,内存开销极低。
  • 仅将神经元分组为胶囊(GroupNeuron)无法提升性能,表明正交投影是本方法成功的关键。
  • 胶囊投影机制在数学上等价于高维权重重归一化,标准权重重归一化为其一维特例。
  • 可视化结果表明,正确分类样本在对应子空间中的胶囊投影更长,验证了胶囊长度作为可靠分类指标的有效性。
  • 该方法计算开销可忽略不计——在 CIFAR10 上使用 ResNet-110 时,每轮迭代耗时不足 0.01 秒,适用于实际部署。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。