QUICK REVIEW

[論文レビュー] Sobolev Training for Neural Networks

Wojciech Marian Czarnecki, Simon Osindero|arXiv (Cornell University)|Jun 15, 2017

Adversarial Robustness in Machine Learning参考文献 18被引用数 60

ひとこと要約

本論文は、Sobolev Training を導入し、標的微分をニューラルネットワークの学習に組み込むことで、回帰・蒸留・合成勾配の文脈におけるデータ効率と汎化能力を向上させる。

ABSTRACT

At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input - for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training. By optimising neural networks to not only approximate the function's outputs but also the function's derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data-efficiency and generalization capabilities of our learned function approximation. We provide theoretical justifications for such an approach as well as examples of empirical evidence on three distinct domains: regression on classical optimisation datasets, distilling policies of an agent playing Atari, and on large-scale applications of synthetic gradients. In all three domains the use of Sobolev Training, employing target derivatives in addition to target values, results in models with higher accuracy and stronger generalisation.

研究の動機と目的

Sobolev Training をニューラルネットワークの新しいパラダイムとして導入する。
値と同時に導関数を一致させることが実現可能で有益であることを理論的に正当化する。
回帰ベンチマーク、ポリシー蒸留、および合成勾配の状況にわたる実証的証拠を提供する。

提案手法

標準の損失を拡張し、1次および高次導関数の導関数一致項を含める（Sobolev 損失）。
導関数をランダムベクトルへ射影することで計算コストを削減する確率的変種を提供する。
ReLU ネットワークにおける Sobolev 空間での普遍近似性を理論的に確立し、サンプル複雑性の利点について議論する。
3つの領域での適用を実証する：最適化ベンチマークでの回帰、Atari のポリシー蒸留、そして大規模な合成勾配学習。

実験結果

リサーチクエスチョン

RQ1Sobolev Training は標準の値のみの学習と比較してサンプル複雑性を低減できるか？
RQ2Sobolev Training を用いたニューラルネットワークは、実践において関数値と導関数の双方を正確に近似できるか？
RQ3蒸留および合成勾配の文脈における性能に対する Sobolev Training の影響はどうなるか？
RQ4ReLU のような現代的活性化関数を用いた Sobolev 空間での近似に対する理論的保証は何か？

主な発見

Sobolev Training は、ベンチマーク関数の低データ領域で近似誤差を低減し、汎化性能を改善する。
蒸留タスクでは、Sobolev Training により蒸留ポリシーがターゲットポリシーにより近く（テスト時の KL 発散が小さい）。
合成勾配では、Sobolev Training が最終精度と安定性を改善し、ImageNet規模の実験を含むベースライン勾配法と比較して優れる。
本論文は、ReLU ネットワークが Sobolev 空間（K=1）で普遍近似できることを示す理論的結果を提供し、サンプル複雑性の利点について議論する。
実証結果は回帰、Atari ポリシー蒸留、そして大規模な合成勾配アプリケーションに及び、ベースラインより一貫した改善を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。