QUICK REVIEW

[論文レビュー] Symbolic Discovery of Optimization Algorithms

Xiangning Chen, Liang Chen|arXiv (Cornell University)|Feb 13, 2023

Machine Learning and Data Classification被引用数 167

ひとこと要約

論文は最適化アルゴリズムの発見をプログラム探索として定式化し、視覚・言語・拡散タスク全体で性能を向上させる、モーメンタム追跡と符号ベースの更新を用いる、シンプルでメモリ効率の高い最適化器 Lion を特定した。

ABSTRACT

We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, $ extbf{Lion}$ ($ extit{Evo$ extbf{L}$ved S$ extbf{i}$gn M$ extbf{o}$me$ extbf{n}$tum}$). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. On vision-language contrastive learning, we achieve 88.3% $ extit{zero-shot}$ and 91.1% $ extit{fine-tuning}$ accuracy on ImageNet, surpassing the previous best results by 2% and 0.1%, respectively. On diffusion models, Lion outperforms Adam by achieving a better FID score and reducing the training compute by up to 2.3x. For autoregressive, masked language modeling, and fine-tuning, Lion exhibits a similar or better performance compared to Adam. Our analysis of Lion reveals that its performance gain grows with the training batch size. It also requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function. Additionally, we examine the limitations of Lion and identify scenarios where its improvements are small or not statistically significant. Lion is also successfully deployed in production systems such as Google search ads CTR model.

研究の動機と目的

新たなハンドクラフト以外の最適化アルゴリズムの発見を動機づけ、深層ニューラルネットワークの訓練効率と一般化性能の向上を目指す。
最適化アルゴリズムの発見を実行可能なプログラム探索問題として定式化し、新しいアルゴリズム設計を探求する。
無限かつ疎な探索空間をナビゲーションし、大規模タスク全体で一般化可能な解を絞り込む技術を開発する。
視覚・言語・拡散モデルにおける発見された最適化器の実用的な性能を示す。

提案手法

重み・勾配・補助状態上で動作する確定的な署名を持つインプリティブプログラムとして最適化アルゴリズムを表現する。
候補アルゴリズムを符号化するため、45 個の数学関数の集合からなる大規模な文の探索空間を用いる。
初期化を AdamW 由来のウォームスタートとリスタートを伴う進化的探索で空間を探索する。
抽象実行とキャッシュを用いて無効な/意味的に同値なプログラムを絞り込み、評価を高速化する。
より大規模な代理タスクへと段階的に拡張するファンネル選択とメタ検証を用いて、ターゲットタスクへ一般化するアルゴリズムを選択する。
発見されたプログラムを、冗長性を削除しアップデートを符号ベースのモーメント更新に揃えることで Lion に単純化する。

実験結果

リサーチクエスチョン

RQ1無限かつ疎な空間でプログラム探索によって最適化アルゴリズムを効果的に発見できるか？
RQ2自動発見された最適化器は代理タスクから大規模で最先端の訓練設定へ一般化するか？
RQ3発見された最適化器の特性とアーキテクチャ・タスク間の実用的限界は何か？
RQ4簡単な符号ベースの更新とモーメントが現実の訓練で AdamW および Adafactor とどのように比較されるか？

主な発見

Lion はモーメントを追跡し符号更新を用いる、シンプルでメモリ効率の高い最適化アルゴリズムで、特定の設定で従来ベースラインより ImageNet ゼロショット精度を最大で2ポイント、ファインチューニング精度を0.1ポイント高く達成する。
Lion は JFT における事前学習計算を最大5倍削減し、拡散モデル訓練を約2.3倍高速化できる。
複数モデルで ImageNet 上、Lion は AdamW を上回り、容量の大きいモデルや大きなバッチサイズを用いる場合により大きな利得を得る。
視覚言語学習（LiT/BASIC セットアップ）では、Lion は AdamW よりゼロショット ImageNet 精度を約1.0〜1.7ポイント改善し、検索精度も向上させる。
バッチサイズが大きくなるほど Lion の性能優位性は拡大し、効果的な正則化を維持するには学習率を小さく、ウェイトデカイを大きくする必要がある。
探索過程は他の最適化器の中から Lion を特定し、代理タスク探索の後半におけるメタ過適合がターゲットタスクへ一般化する傾向を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。