QUICK REVIEW

[论文解读] Deep Learning as a Mixed Convex-Combinatorial Optimization Problem

Abram L. Friesen, Pedro Domingos|arXiv (Cornell University)|Oct 31, 2017

Machine Learning and Algorithms被引用 2

一句话总结

该论文将具有硬阈值激活函数的深度学习建模为混合凸-组合优化问题，通过递归小批量优化实现系统性训练，将网络分解为线性可分的感知机。在ImageNet上对AlexNet和ResNet-18的实验表明，该方法在分类准确率上优于直通估计器。

ABSTRACT

As neural networks grow deeper and wider, learning networks with hard-threshold activations is becoming increasingly important, both for network quantization, which can drastically reduce time and energy requirements, and for creating large integrated systems of deep networks, which may have non-differentiable components and must avoid vanishing and exploding gradients for effective learning. However, since gradient descent is not applicable to hard-threshold functions, it is not clear how to learn networks of them in a principled way. We address this problem by observing that setting targets for hard-threshold hidden units in order to minimize loss is a discrete optimization problem, and can be solved as such. The discrete optimization goal is to find a set of targets such that each unit, including the output, has a linearly separable problem to solve. Given these targets, the network decomposes into individual perceptrons, which can then be learned with standard convex approaches. Based on this, we develop a recursive mini-batch algorithm for learning deep hard-threshold networks that includes the popular but poorly justified straight-through estimator as a special case. Empirically, we show that our algorithm improves classification accuracy in a number of settings, including for AlexNet and ResNet-18 on ImageNet, when compared to the straight-through estimator.

研究动机与目标

解决使用硬阈值激活函数的深度神经网络训练问题，因其不可微分而与标准梯度下降方法不兼容。
克服直通估计器的局限性，该方法缺乏理论依据，且常导致次优性能。
通过将问题重新表述为带凸子问题的离散优化，实现对大规模深度网络中不可微组件的有效学习。
开发一种递归小批量算法，系统性地求解硬阈值单元的最优目标，以确保梯度流动与训练稳定性。

提出的方法

将硬阈值网络的学习重新表述为离散优化问题，通过选择隐藏单元的目标以确保线性可分性。
通过设定目标将网络分解为单个感知机，使每个单元求解凸的、线性可分的优化问题。
采用递归小批量算法，迭代更新目标与权重，对每个感知机利用凸优化技术求解。
当目标被设定为最近的有效值而无需显式优化时，将直通估计器作为所提算法的特例嵌入其中。
在小批量设置下应用该算法，以在ImageNet等大规模数据集上保持训练效率与可扩展性。
通过保持一致的目标分配，确保梯度传播，从而在反向传播过程中维持网络的功能行为。

实验结果

研究问题

RQ1尽管激活函数不可微，硬阈值神经网络是否仍能被有效训练？
RQ2能否通过一个系统性优化框架，对直通估计器进行理论化解释或改进？
RQ3将硬阈值学习建模为混合凸-组合问题，是否能带来优于现有启发式方法的性能？
RQ4通过目标分配实现的递归小批量优化，是否能提升ImageNet等标准基准上的泛化能力与准确率？

主要发现

在ImageNet上对AlexNet和ResNet-18进行训练时，所提算法在分类准确率上优于直通估计器。
该方法通过将问题转化为一系列凸优化，为训练硬阈值网络提供了理论依据。
直通估计器被正式嵌入为所提算法的特例，为其应用提供了理论基础。
通过最优目标分配确保线性可分性，该方法避免了深层网络中的梯度消失与爆炸问题。
实验结果表明，该方法在多种网络架构上均表现出一致的性能提升，验证了混合凸-组合方法的有效性。
递归小批量算法实现了深度硬阈值网络的可扩展训练，同时保持了性能增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。