QUICK REVIEW

[论文解读] Training Deep Networks with Structured Layers by Matrix Backpropagation

Catalin Ionescu, Orestis Vantzos|arXiv (Cornell University)|Sep 25, 2015

Advanced Neural Network Applications参考文献 48被引用 47

一句话总结

本文提出矩阵反向传播（matrix backpropagation），一种数学框架，可实现包含归一化割（normalized cuts）和二阶池化（second-order pooling）等结构化全局层的深度网络端到端训练。通过将反向传播推广至伴随矩阵变体，该方法实现了对矩阵函数的高效可微计算，相较于标准深度网络，在BSDS和MSCOCO等图像分割基准测试中表现更优。

ABSTRACT

Deep neural network architectures have recently produced excellent results in a variety of areas in artificial intelligence and visual recognition, well surpassing traditional shallow architectures trained using hand-designed features. The power of deep networks stems both from their ability to perform local computations followed by pointwise non-linearities over increasingly larger receptive fields, and from the simplicity and scalability of the gradient-descent training procedure based on backpropagation. An open problem is the inclusion of layers that perform global, structured matrix computations like segmentation (e.g. normalized cuts) or higher-order pooling (e.g. log-tangent space metrics defined over the manifold of symmetric positive definite matrices) while preserving the validity and efficiency of an end-to-end deep training framework. In this paper we propose a sound mathematical apparatus to formally integrate global structured computation into deep computation architectures. At the heart of our methodology is the development of the theory and practice of backpropagation that generalizes to the calculus of adjoint matrix variations. The proposed matrix backpropagation methodology applies broadly to a variety of problems in machine learning or computational perception. Here we illustrate it by performing visual segmentation experiments using the BSDS and MSCOCO benchmarks, where we show that deep networks relying on second-order pooling and normalized cuts layers, trained end-to-end using matrix backpropagation, outperform counterparts that do not take advantage of such global layers.

研究动机与目标

实现包含分割和高阶池化等全局结构化矩阵计算的深度神经网络的端到端训练。
基于伴随矩阵变体，形式化反向传播在矩阵值函数上的推广。
展示在视觉识别任务中，将结构化层集成到深度架构中的可行性及其性能提升。
为深度学习中通过谱运算和非线性矩阵运算进行微分提供严格的数学基础。

提出的方法

基于伴随矩阵变体的微积分，提出反向传播的矩阵推广方法，实现对结构化矩阵函数的梯度计算。
将该框架应用于两种关键结构化层：基于对数协方差描述符的二阶池化，以及用于图像分割的归一化割。
利用矩阵内积和恒等式（如Frobenius范数、Hadamard积）推导谱运算和非线性运算的解析梯度。
推导出矩阵对数和特征值分解等矩阵函数梯度的闭式表达式。
在MATLAB中实现该方法，并在Titan Z GPU上验证其性能，实现每秒2–3张图像的实时推理。
将结构化层集成到深度网络架构中，实现局部卷积层与全局矩阵层的联合优化。

实验结果

研究问题

RQ1能否将归一化割和二阶池化等全局结构化矩阵运算集成到端到端深度学习框架中？
RQ2如何将反向传播推广至处理矩阵值函数及其变体，实现可微计算？
RQ3与固定或手工设计的替代方法相比，学习结构化层对分割性能有何影响？
RQ4所提出的矩阵反向传播方法能否高效计算非线性及谱矩阵运算的梯度？

主要发现

采用矩阵反向传播的深度网络在BSDS和MSCOCO基准测试中优于标准网络，尤其在图像分割任务中表现更优。
集成二阶池化和归一化割层显著提升了分割结果的定量与定性表现。
训练过程中相似性矩阵的秩降低与性能提升相关，表明全局结构得到有效学习。
该方法在Titan Z GPU上实现每秒约2–3张图像的实时训练与推理，证明了其实际可行性。
所提框架支持对局部与全局层的解析梯度计算，保持端到端可微性。
实验表明，当预测的相似性矩阵秩初始值接近真实值时，秩降低发生并提升分割精度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。