[Paper Review] Generalized Capsule Networks with Trainable Routing Procedure
This paper proposes Generalized Capsule Networks (G-CapsNet), where the routing procedure is fully differentiable and trained end-to-end by making coupling coefficients learnable, eliminating the need for manual routing iteration settings. The method achieves comparable MNIST performance to prior CapsNets with significantly fewer parameters, and experiments show capsule packaging strategy has minimal impact on accuracy, though deeper architectures face saturation issues.
CapsNet (Capsule Network) was first proposed by~\citet{capsule} and later another version of CapsNet was proposed by~\citet{emrouting}. CapsNet has been proved effective in modeling spatial features with much fewer parameters. However, the routing procedures in both papers are not well incorporated into the whole training process. The optimal number of routing procedure is misery which has to be found manually. To overcome this disadvantages of current routing procedures in CapsNet, we embed the routing procedure into the optimization procedure with all other parameters in neural networks, namely, make coupling coefficients in the routing procedure become completely trainable. We call it Generalized CapsNet (G-CapsNet). We implement both "full-connected" version of G-CapsNet and "convolutional" version of G-CapsNet. G-CapsNet achieves a similar performance in the dataset MNIST as in the original papers. We also test two capsule packing method (cross feature maps or with feature maps) from previous convolutional layers and see no evident difference. Besides, we also explored possibility of stacking multiple capsule layers. The code is shared on \hyperlink{https://github.com/chenzhenhua986/CAFFE-CapsNet}{CAFFE-CapsNet}.
Motivation & Objective
- To address the limitation of fixed, non-learnable routing iterations in CapsNets that require manual tuning.
- To integrate the capsule routing procedure into the overall optimization process, making coupling coefficients trainable.
- To evaluate the impact of different capsule packaging strategies (across vs. within feature maps) on performance.
- To investigate the scalability of CapsNets by stacking multiple capsule layers.
- To explore whether capsule networks can be extended beyond single-layer architectures without performance degradation.
Proposed method
- Embed the routing procedure into the optimization process by making coupling coefficients $ c^{(l)}_{ji} $ trainable parameters alongside weights $ W^{(l)}_{ji} $, enabling end-to-end backpropagation.
- Formulate a joint loss function that includes both transformation matrix weights and coupling coefficients, regularized via L2 penalty.
- Use the squash function from Sabour et al. (2017) and Edgar et al. (2017) to normalize capsule outputs and introduce non-linearity.
- Implement both fully connected and convolutional variants of G-CapsNet, with shared transformation matrices in the convolutional version.
- Design a capsule version of ReLU to improve training stability in deeper architectures.
- Apply margin loss for classification, as in the original CapsNet, to train the network for object recognition.
Experimental results
Research questions
- RQ1Can the routing procedure in CapsNets be made fully trainable by learning coupling coefficients during backpropagation?
- RQ2Does the choice of capsule packaging strategy—across feature maps or within feature maps—affect model performance?
- RQ3Can deeper capsule networks be successfully trained, and what are the challenges in scaling CapsNets beyond a single capsule layer?
- RQ4How does the performance of G-CapsNet compare to baseline CapsNets in terms of error rate and parameter efficiency?
- RQ5Does end-to-end training of routing eliminate the need for manual setting of routing iterations?
Key findings
- G-CapsNet achieves a test error rate of 0.66% on MNIST using only 8.2 million parameters, outperforming the baseline CapsNet (0.83% error with 35.4M parameters) when reconstruction is used.
- The full-connected G-CapsNet variant with no reconstruction achieves 0.66% error using just 6.8 million parameters, demonstrating high parameter efficiency.
- The convolutional G-CapsNet variant achieves 0.70% error with 5.5 million parameters, showing that parameter efficiency is preserved in the convolutional setting.
- There is no significant performance difference between packaging capsules across feature maps and within feature maps, with error rates of 0.68% and 0.66% respectively.
- Multi-layer G-CapsNets tend to saturate during training, even with a capsule version of ReLU, indicating scalability remains a major challenge.
- The proposed end-to-end trainable routing procedure removes the need for manual routing iteration tuning and ensures convergence through optimization.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.