[论文解读] Sobolev Training for Neural Networks
本文介绍 Sobolev Training 将目标导数纳入神经网络训练,提高数据效率和泛化,覆盖回归、蒸馏和合成梯度情境。
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input - for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training. By optimising neural networks to not only approximate the function's outputs but also the function's derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data-efficiency and generalization capabilities of our learned function approximation. We provide theoretical justifications for such an approach as well as examples of empirical evidence on three distinct domains: regression on classical optimisation datasets, distilling policies of an agent playing Atari, and on large-scale applications of synthetic gradients. In all three domains the use of Sobolev Training, employing target derivatives in addition to target values, results in models with higher accuracy and stronger generalisation.
研究动机与目标
- 将 Sobolev Training 引入神经网络范式。
- 理论上证明在数值匹配同时对导数进行匹配是可行且有益的。
- 在回归基准、策略蒸馏和合成梯线情景中提供实证证据。
提出的方法
- 扩展标准损失以包含一阶(及更高阶)导数的导数匹配项(Sobolev 损失)。
- 给出一种随机变体,将导数投影到随机向量上,以降低计算成本。
- 从理论上建立 ReLU 网络在 Sobolev 空间中的通用近似性,并讨论样本复杂度的好处。
- 在三个领域展示应用:优化基准上的回归、Atari 策略蒸馏,以及大规模的合成梯度训练。
实验结果
研究问题
- RQ1与标准仅值训练相比,Sobolev Training 是否能降低样本复杂度?
- RQ2采用 Sobolev Training 的神经网络在实践中是否能同时准确逼近函数值与导数?
- RQ3Sobolev Training 在蒸馏和合成梯度情境中的性能有何影响?
- RQ4使用如 ReLU 等现代激活函数时,在 Sobolev 空间中的逼近理论保证有哪些?
主要发现
- Sobolev Training 在低数据情形下的基准函数上能够获得更低的近似误差和更好的泛化。
- 在蒸馏任务中,Sobolev Training 产生的蒸馏策略与目标策略更相似(测试时 KL 发散更低)。
- 对于合成梯度,与基线梯度方法相比,Sobolev Training 提高了最终准确性和稳定性,包括 ImageNet 规模的实验。
- 论文给出理论结果,表明 ReLU 网络可以在 Sobolev 空间(K=1)实现通用逼近,并讨论了样本复杂度的优势。
- 实证结果覆盖回归、Atari 策略蒸馏和大规模合成梯度应用,且普遍优于基线。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。