QUICK REVIEW

[论文解读] Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training

Fangda Gu, Armin Askari|arXiv (Cornell University)|Nov 1, 2018

Advanced Neural Network Applications被引用 19

一句话总结

Fenchel提升网络引入了一种新颖的神经网络训练框架，将激活函数表述为双凸约束，并利用拉格朗日松弛法推导出标准训练目标的严格下界。该方法通过在数据点和层之间实现并行化的块坐标下降优化，实现了与标准全连接网络和卷积网络相当或更优的性能。

ABSTRACT

Despite the recent successes of deep neural networks, the corresponding training problem remains highly non-convex and difficult to optimize. Classes of models have been proposed that introduce greater structure to the objective function at the cost of lifting the dimension of the problem. However, these lifted methods sometimes perform poorly compared to traditional neural networks. In this paper, we introduce a new class of lifted models, Fenchel lifted networks, that enjoy the same benefits as previous lifted models, without suffering a degradation in performance over classical networks. Our model represents activation functions as equivalent biconvex constraints and uses Lagrange Multipliers to arrive at a rigorous lower bound of the traditional neural network training problem. This model is efficiently trained using block-coordinate descent and is parallelizable across data points and/or layers. We compare our model against standard fully connected and convolutional networks and show that we are able to match or beat their performance.

研究动机与目标

解决深度神经网络训练中的非凸性与优化困难问题。
开发一种提升的神经网络架构，在引入凸松弛结构优势的同时保持性能。
通过拉格朗日乘子法，为标准神经网络训练目标提供严格的下界。
通过在提升问题上采用块坐标下降，实现高效且可并行化的训练。
证明提升模型在不降级的情况下可匹配或超越经典网络的性能。

提出的方法

将每个激活函数表示为等价的双凸约束，以构建优化景观。
应用拉格朗日松弛法，推导原始非凸训练问题的下界。
使用块坐标下降优化松弛后的对偶问题，实现在数据点和网络层之间的并行化。
将训练目标表述为基于Fenchel共轭的对偶问题，以确保子问题的凸性。
通过提升框架保持端到端可微性与与标准网络架构的兼容性。
通过迭代求解对偶问题，收敛至一个能界定原始网络损失的解。

实验结果

研究问题

RQ1提升的神经网络框架是否能在改善优化结构的同时，实现与标准网络相当或更优的性能？
RQ2使用拉格朗日松弛法和双凸约束是否能为原始训练目标提供更紧致且更易处理的下界？
RQ3所提出的方法是否能通过在数据和层之间并行化的块坐标下降实现高效优化？
RQ4Fenchel提升网络在基准任务上的性能与标准全连接和卷积网络相比如何？
RQ5该提升方法是否消除了以往提升模型中常见的性能退化问题？

主要发现

Fenchel提升网络在基准任务上的性能与标准全连接和卷积网络相当或更优。
该方法通过激活约束的拉格朗日松弛法，为原始非凸训练目标提供了严格的下界。
块坐标下降实现了高效且可扩展的优化，支持在数据点和网络层之间并行化。
该框架避免了以往提升模型中通常观察到的性能退化，保持了竞争力的准确率。
该方法成功平衡了结构优化优势与高模型容量及泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。