QUICK REVIEW

[論文レビュー] Gradients without Backpropagation

Atılım Güneş Baydin, Barak A. Pearlmutter|arXiv (Cornell University)|Feb 17, 2022

Machine Learning and ELM被引用数 20

ひとこと要約

本論文は forward gradient を導入する。forward-mode automatic differentiation によって計算される偏りのない勾配推定量で、backpropagation を用いず勾配降下を実現し、いくつかの ML タスクで速度向上を達成する。

ABSTRACT

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.

研究の動機と目的

勾配推定法を、backpropagation のみならず forward-mode AD のみを用いて定義・動機づける。
forward gradient が真の勾配の unbiased な推定量であることを証明する。
様々な ML アーキテクチャで実用的な forward gradient descent を示す。
PyTorch における forward-mode AD システムを実装し、その性能を逆モード AD と比較する。

提案手法

forward gradient を g(θ) = (∇f(θ)·v) v、v ~ p(v) が独立した平均0、分散1 の成分を持つ、として定義する。
unbiasedness を証明：E[g(θ)] = ∇f(θ)。
PyTorch の reverse-mode autograd とは独立した、PyTorch の外で動く scratch の forward-mode AD エンジンを実装する。
SGD における forward gradient descent を適用し、ロジスティック回帰、MLP、CNN で backpropagation と同等の反復予算で実行する。
R_f と R_b を用いて実行時コストを分析し、各アーキテクチャでの収束時間 T_f と T_b を比較する。

実験結果

リサーチクエスチョン

RQ1forward-mode AD は SGD の最適化に十分な unbiased な勾配推定を提供できるか？
RQ2単純なアーキテクチャと深いアーキテクチャにおける forward gradient descent の実行時間と収束特性は？
RQ3backpropagation と比較して前向き勾配は訓練品質を犠牲にせず実用的な速度向上を提供するか？
RQ4モデルの深さとデータサイズが大きくなると forward gradient descent はどのようにスケールするか？

主な発見

Forward gradient は勾配の unbiased な推定量であり、backpropagation を用いず SGD を可能にする。
MNIST のロジスティック回帰、MLP、CNN において、forward gradient はウォールクロック時間で速くなることがあり、しばしば同等かそれ以上の loss per time を達成する。
MNIST のロジスティック回帰では、forward gradient は runtime と loss の性能の両方で backpropagation の約 2 倍の速さを示す。
CNN では forward gradient は実行時間を大幅に節約し、目標損失に到達するまでの時間は約 2 倍の範囲内の損失改善を示す。
forward gradient 法はネットワーク深さとともに reasonably スケールし、より大きなレイヤ数まで有利な実行時間特性を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。