QUICK REVIEW

[论文解读] Multi-Task Learning as Multi-Objective Optimization

Ozan Şener|arXiv (Cornell University)|Oct 10, 2018

Domain Adaptation and Few-Shot Learning被引用 340

一句话总结

论文将多任务学习重新框架为帕累托最优的多目标问题，提出一种用于深度网络的可扩展梯度基优化器（MGDA-UB 配合 Frank-Wolfe），并在 MultiMNIST、CelebA 和 Cityscapes 数据集上比基线方法展现出更优的性能。

ABSTRACT

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.

研究动机与目标

由于任务冲突和相互竞争的目标，推动将 MTL 视为多目标问题。
将 MTL 表述为寻求帕累托最优解，而非单一加权和。
开发一种可扩展的优化器，能够处理深度网络中的高维梯度和大量任务。
在合理假设下证明优化上界能产生帕累托最优性。
在多样数据集和任务集（2–40 任务）中展示有效性。

提出的方法

将每个任务的损失表述为向量值目标 L(θsh, θ1,..., θT)。
使用 MGDA/KKT 框架的梯度基多目标优化来找到下降方向。
在任务梯度的凸包中求解最小范数问题，以获得任务组合权重 α1,...,αT（式（Eq. 3））。
使用基于 Frank-Wolfe 的求解器以可扩展方式计算 α；为两任务情形推导解析线搜索（式（Eq. 4））。
引入 MGDA-UB：用相对于表示 Z 的梯度构成的上界代替共享梯度范数，从而仅需一次反向传播（第3.3节）。
给出定理1，表明在 ∂Z/∂θsh 的满秩假设下 MGDA-UB 能产生帕累托驻点。
将该方法适配到编码器-解码器结构，使更新可通过一次反向传播和共享表示 g(·; θsh) 来计算。

实验结果

研究问题

RQ1如何将 MTL 转化为多目标优化问题，以及合适的最优性概念（帕累托最优性）是什么？
RQ2梯度基的 MGDA 能否扩展到高维深度网络和大量任务而不带来过高开销？
RQ3在现实条件下，优化上界（MGDA-UB）是否能保持帕累托最优性？
RQ4提出的方法在不同领域（分类、多标签、场景理解）和不同目标数量（2–40）下的任务表现如何？

主要发现

MGDA-UB 方法在开销可忽略的情况下产生帕累托最优或帕累托驻点解。
在 MultiMNIST（2 任务）上，该方法达到与单任务相同的性能，超过其他 MTL 基线，体现了有效的容量共享。
在 CelebA（40 任务）上，该方法的平均误差低于均匀缩放、Kendall 2018、GradNorm。
在 Cityscapes（3 任务）上，该方法在基线中获得最高的 mIoU 和最低的像素差错。
MGDA-UB 显著加速训练（3 任务场景理解降低 40%；40 任务 CelebA 提速 25 倍），精度与完整 MGDA 相当。
在各任务和数据集上，该方法始终优于基线，支持具有众多任务的可扩展 MTL。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。