QUICK REVIEW

[論文レビュー] Multi-Task Learning as Multi-Objective Optimization

Ozan Şener|arXiv (Cornell University)|Oct 10, 2018

Domain Adaptation and Few-Shot Learning被引用数 340

ひとこと要約

本論文はマルチタスク学習を Pareto-optimal なマルチ目的問題として再定式化し、深層ネットワーク向けのスケーラブルな勾配ベースの最適化手法（Frank-Wolfe を用いる MGDA-UB）を導入し、MultiMNIST、CelebA、Cityscapes においてベースラインと比較して優れた性能を示す。

ABSTRACT

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.

研究の動機と目的

タスク間の衝突と競合する目的のため、MTL をマルチオブジェクティブ問題として動機づける。
単一の重み付き和ではなく Pareto-optimal 解を追求するよう MTL を定式化する。
高次元の勾配と多数のタスクを深層ネットワークで扱えるスケーラブルな最適化アルゴリズムを開発する。
合理的な仮定の下で上限を最適化することが Pareto 最適性をもたらすことを証明する。
多様なデータセットとタスク集合（2–40 タスク）で有効性を示す。

提案手法

各タスクの損失をベクトル値の目的関数 L(θsh, θ1,..., θT) に定式化する。
MGDA/KKT フレームワークを用いた勾配ベースの多目的最適化を適用し、降下方向を見つける。
タスク勾配の凸結晶の中で最小ノルム問題を解き、タスク結合重み α1,...,αT を得る（式 Eq. 3）。
Frank-Wolfe ベースの解法を用いて α をスケーラブルに計算する；二タスクの場合の解析的ライン探索を導出する（Eq. 4）。
MGDA-UB を導入：共有勾配ノルムを、表現 Z に関する勾配を用いる上限に置換し、1 回のバックワードパスのみを必要とする（セクション 3.3）。
定理 (Theorem 1) を提供し、∂Z/∂θsh の全階数仮定の下で MGDA-UB は Pareto 停止点を与えることを示す。
エンコーダ-デコーダアーキテクチャへ手法を適用し、更新を 1 回のバックワードパスと共有表現 g(·; θsh) で計算できるよう適合させる。

実験結果

リサーチクエスチョン

RQ1MTL を多目的最適化問題としてどのように定式化し、適切な最適性の概念（Pareto 最適性）は何か？
RQ2勾配ベースの MGDA を高次元の深層ネットワークと多くのタスクに対して過大なオーバーヘッドなしにスケールできるか？
RQ3現実的な条件下で上限を最適化する（MGDA-UB）が Pareto 最適性を保つか？
RQ4提案手法は、異なるドメイン（分類、マルチラベル、シーン理解）において、目的の数が 2–40 のタスクに対してどのように性能を示すか？

主な発見

MGDA-UB アプローチは、ほとんどオーバーヘッドを伴わずに Pareto 最適解または Pareto 停止解をもたらす。
MultiMNIST（2 タスク）では、単一タスクと同等の性能を達成し、他の MTL ベースラインを超え、容量共有の効果を示す。
CelebA（40 タスク）では、均等スケーリング、Kendall 2018、GradNorm より平均誤差が小さい。
Cityscapes（3 タスク）では、最高の mIoU とベースラインの中で最も低いピクセル歪み誤差を達成。
MGDA-UB はトレーニングを大幅に高速化（3 タスクのシーン理解で 40% 削減、40 タスクの CelebA で 25× のスピードアップ）、 full MGDA と同等の精度。
タスクとデータセットを横断して、手法は一貫してベースラインを上回り、多数タスクを伴うスケーラブルな MTL を支持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。