QUICK REVIEW

[论文解读] On Loss Functions for Deep Neural Networks in Classification

Katarzyna Janocha, Wojciech Marian Czarnecki|arXiv (Cornell University)|Feb 18, 2017

Anomaly Detection Techniques and Applications参考文献 7被引用 57

一句话总结

本文分析除了标准对数损失之外的不同损失函数如何影响训练动态、鲁棒性和深度分类器的性能，提供理论依据和经验比较。

ABSTRACT

Deep neural networks are currently among the most commonly used classifiers. Despite easily achieving very good performance, one of the best selling points of these models is their modular design - one can conveniently adapt their architecture to specific needs, change connectivity patterns, attach specialised layers, experiment with a large amount of activation functions, normalisation schemes and many others. While one can find impressively wide spread of various configurations of almost every aspect of the deep nets, one element is, in authors' opinion, underrepresented - while solving classification problems, vast majority of papers and applications simply use log loss. In this paper we try to investigate how particular choices of loss functions affect deep models and their learning dynamics, as well as resulting classifiers robustness to various effects. We perform experiments on classical datasets, as well as provide some additional, theoretical insights into the problem. In particular we show that L1 and L2 losses are, quite surprisingly, justified classification objectives for deep nets, by providing probabilistic interpretation in terms of expected misclassification. We also introduce two losses which are not typically used as deep nets objectives and show that they are viable alternatives to the existing ones.

研究动机与目标

Investigate how alternative loss functions influence training dynamics of deep classifiers.
Provide probabilistic interpretations for non-traditional losses like L1 and L2 in classification.
Evaluate robustness to input and label noise across various losses through experiments on standard datasets.
Offer guidance on when to prefer margin, expectation, or other losses over log loss in deep nets.

提出的方法

Analyze twelve loss functions including L1, L2, L1 with sigma, L2 with sigma, L2 with Chebyshev, hinge and its variants, log cross-entropy, squared log, Tanimoto, and Cauchy-Schwarz Divergence.
Provide theoretical propositions linking L1 and L2 to expected misclassification and regularised expectations.
Examine derivative properties and piecewise linearity of losses, especially in relation to final-layer activations.
Empirically compare losses on toy datasets and standard benchmarks (MNIST, CIFAR-10) using deep networks with varying depth and architectures.
Assess learning speed, final accuracy, and noise robustness under input and label perturbations.

实验结果

研究问题

RQ1How do different loss functions influence learning dynamics and convergence in deep nets for classification?
RQ2Do regression-oriented losses like L1 and L2 have meaningful probabilistic interpretations as classification objectives?
RQ3Which losses offer faster convergence, better generalisation, or greater robustness to input and label noise in deep architectures?
RQ4How do non-traditional losses (Tanimoto, Cauchy-Schwarz Divergence) compare to standard cross-entropy in practice?
RQ5Under what circumstances should practitioners prefer margin-based, expectation-based, or log loss in classification tasks?

主要发现

L1 and L2 losses have probabilistic interpretations tied to expected misclassification, offering robust viewpoints for classification objectives.
Non-monotonic and non-convex derivatives arise for L1/L2 when applied to probabilities, slowing learning especially with heavily misclassified examples.
Margin-based losses (hinge and its variants) often yield faster training and strong generalisation on deep nets, particularly with deeper architectures.
Expectation losses (L1∘σ and L2∘σ) tend to be slower to train but can offer robustness to input and label noise.
Cauchy-Schwarz Divergence behaves competitively, sometimes outperforming log loss in speed and final performance on MNIST and CIFAR-10 in the reported setups.
Tanimoto loss shows strong robustness to noise in certain experiments, indicating potential for further study.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。