QUICK REVIEW

[论文解读] Efficient Continuous Pareto Exploration in Multi-Task Learning

Pingchuan Ma, Tao Du|arXiv (Cornell University)|Jun 29, 2020

Educational Technology and Assessment被引用 30

一句话总结

引入一种高效方法，在深度多任务学习中使用二阶、无 Hessian 的方法与 Krylov 求解器来探索任务之间的权衡，重构局部连续的帕累托集合与帕累托前沿。

ABSTRACT

Tasks in multi-task learning often correlate, conflict, or even compete with each other. As a result, a single solution that is optimal for all tasks rarely exists. Recent papers introduced the concept of Pareto optimality to this field and directly cast multi-task learning as multi-objective optimization problems, but solutions returned by existing methods are typically finite, sparse, and discrete. We present a novel, efficient method that generates locally continuous Pareto sets and Pareto fronts, which opens up the possibility of continuous analysis of Pareto optimal solutions in machine learning problems. We scale up theoretical results in multi-objective optimization to modern machine learning problems by proposing a sample-based sparse linear system, for which standard Hessian-free solvers in machine learning can be applied. We compare our method to the state-of-the-art algorithms and demonstrate its usage of analyzing local Pareto sets on various multi-task classification and regression problems. The experimental results confirm that our algorithm reveals the primary directions in local Pareto sets for trade-off balancing, finds more solutions with different trade-offs efficiently, and scales well to tasks with millions of parameters.

研究动机与目标

激励在多任务学习中超越单一解优化来探索权衡的必要性。
提出一个两阶段算法以在深度多任务学习中恢复并扩展局部帕累托集合。
实现可扩展且密集的帕累托前沿分析，适用于大型神经网络。
实现局部帕累托集合的连续参数化，便于直观遍历。

提出的方法

通过局部切平面展开，使用一阶与二阶信息（梯度和Hessian）来表述帕累托最优性与局部帕累托集合。
通过求解一个小型凸问题来获得梯度组合的权重 alpha，从而计算帕累托驻点。
使用无矩阵 Hessian 线性求解法的 Krylov 方法（MINRES）求解大型稀疏的基于 Hessian 的线性系统来估计切向方向，以避免完整 Hessian 计算。
沿着切向方向从帕累托驻点向外扩展，产生候选点 x* + s v，并进行归一化以确保稳定性。
通过将帕累托点与其探索方向进行凸组合，构建一个连续的局部帕累托集合，形成一个局部线性子空间。
通过检测碰撞并合并来拼接局部前沿，形成更大的连续帕累托前沿。

实验结果

研究问题

RQ1提议的基于切向的扩展能否在深度多任务学习中准确逼近局部帕累托集合？
RQ2在相似的计算预算下，方法是否比离散解基线产生更密集且更具多样性的帕累托前沿？
RQ3该方法是否可扩展到拥有数百万参数的网络，同时保持效率？
RQ4局部帕累托集合是否可以重新参数化为一个低维空间以实现直观遍历？

主要发现

相对于完整的 MTL 训练，生成的帕累托前沿比先前的离散解方法密度更高，开销适中。
使用带 Hessian-向量积的 MINRES 来高效获得探索方向，使规模随网络大小线性增长（O(kn)）。
在跨多个数据集和架构中发现多样化权衡解方面优于基线。
证明连续帕累托集合可以在低维空间中重新参数化，以实现直观操作和遍历。
表明该方法可以从中等规模数据集（如 MultiMNIST）扩展到拥有数百万参数的大型网络（如 UTKFace）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。