QUICK REVIEW

[論文レビュー] Learning to Teach with Dynamic Loss Functions

Lijun Wu, Fei Tian|arXiv (Cornell University)|Oct 29, 2018

Machine Learning and Algorithms参考文献 42被引用数 42

ひとこと要約

本論文は L2T-DLF を提案する。ニューラル教師が訓練中に学生モデルを導く動的損失関数を出力し、勾配ベースの逆伝播微分によって最適化され、画像分類とニューラル機械翻訳で性能を向上させる。

ABSTRACT

Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e.g., textbooks), but also sets up appropriate learning objectives (e.g., course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as "learning to teach with dynamic loss functions" (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models.

研究の動機と目的

AIにおける損失関数教育の概念を動機づけ、それを人間の教育と試験になぞらえて形式化する。
教師（損失関数生成器）と学生（学習者）を共同訓練するための勾配ベースの最適化フレームワークを開発する。
動的に学習された損失関数が実世界のタスクで学生の性能を向上させることを実証する。
逆モード微分を用いて訓練プロセス全体を逆伝播できる効率的なアルゴリズムを提供する。

提案手法

学生モデル f_ω と SGD によって訓練を導く学習可能な損失 l_Φ を定義する。
学生の状態 s_t に基づいて損失関数係数 Φ_t を出力する教師モデル μ_θ を導入し、訓練中に動的な損失関数を有効にする。
微分不可能なタスク固有指標 m を、学生の出力 p_ω の乱数性を用いた連続的なサロゲートで緩和し、微分可能な目的関数を得る。
逆モード微分（RMD）を適用して訓練全体を逆伝播し、教師パラメータの勾配 dθ を導出する。
勾配ベースの最適化（例: Adam）を用いて教師を更新し、得られた学生の development-set の性能を最大化するよう反復する。
画像分類とニューラル機械翻訳（NMT）での具体的な実装例を示し、loss form として l_Φ(p, y) = -σ(y^T Φ log p) のような損失形と、注意機構に基づく Φ_t の出力を含む。

実験結果

リサーチクエスチョン

RQ1ニューラル教師は、固定された損失よりも学生の開発セットの性能を改善する損失関数を出力することを学習できるか。
RQ2学生の異なる訓練段階に適応する損失関数へと教師を効率的に最適化するには？
RQ3動的に学習された損失関数は、画像分類やニューラル機械翻訳などのタスク間で一般化するか。
RQ4訓練中に学習された損失関数の構造についてどのような洞察が得られるか。

主な発見

教師によって学習された動的損失関数は、複数の学生アーキテクチャとタスクで性能向上をもたらす。
CIFAR-10 のさまざまなモデルで、教師の強化損失は誤差率を低減し、例として WRN は CIFAR-10 で 3.42%、DenseNet-BC は CIFAR-10 で 3.08% に改善。
MNIST では、MLP、LeNet などのモデルで L2T-DLF を用いた訓練が誤差率を低減。
NMT タスク（IWSLT-14 German→English）で、L2T-DLF は LSTM-1、LSTM-2、Transformer の学生で BLEU スコアを向上させる（例: Transformer が 34.01 から 34.80 BLEU へ）。
学習された損失係数 Φ_t は、易しいクラス間の類似性を初期に促進し、後に類似クラス間の識別を厳格化するなど、位相依存的な焦点効果を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。