[論文レビュー] Gradient Surgery for Multi-Task Learning
PCGrad を提案する。勾配手術法で、他のタスク勾配の法線平面上に矛盾するタスク勾配を射影することで多タスク学習における干渉を緩和し、データ効率と性能を向上させる。教師あり学習と強化学習のタスク全般で。
While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.
研究の動機と目的
- Identify optimization challenges in multi-task learning caused by gradient interference.
- Characterize the tragic triad of conflicting gradients, dominating gradients, and high curvature.
- Develop a gradient-surgery technique (PCGrad) to mitigate gradient conflict.
- Demonstrate model-agnostic applicability of PCGrad across supervised and reinforcement learning tasks.
提案手法
- Define the three conditions of the multi-task optimization landscape that cause gradient interference.
- Introduce PCGrad, which projects a task gradient onto the normal plane of another task gradient when their cosine similarity is negative.
- Provide a simple algorithm (Algorithm 1) to apply PCGrad within any gradient-based optimizer.
- Theoretically analyze PCGrad in two-task convex and differentiable settings and derive sufficient conditions for improved optimization.
- Demonstrate compatibility and improvements when combining PCGrad with existing multi-task architectures (e.g., MTAN, routing networks).
- Evaluate PCGrad on multi-task supervised learning, multi-task RL, and goal-conditioned RL to assess data efficiency and performance.
実験結果
リサーチクエスチョン
- RQ1Does PCGrad reduce gradient interference and improve data efficiency across multi-task learning scenarios?
- RQ2Can PCGrad be combined with existing multi-task architectures to yield further performance gains?
- RQ3Is the proposed gradient-triads (conflicting gradients, dominating gradients, high curvature) a major factor in optimization difficulty for multi-task learning?
- RQ4How does PCGrad perform in both supervised and reinforcement learning settings compared to baselines?
主な発見
- PCGrad substantially improves data efficiency and final performance across multi-task supervised learning and multi-task RL problems.
- In CIFAR-100, combining PCGrad with routing networks yields a 2.8 percentage point absolute improvement in test accuracy.
- On CelebA, PCGrad achieves better average multi-task classification error than the prior method Sener and Koltun (8.69 vs. 8.95).
- MTAN with PCGrad achieves best scores in 8 of 9 categories on NYUv2 tasks.
- In Meta-World MT10/MT50 benchmarks, SAC+PCGrad outperforms baselines in success rates and data efficiency, solving more tasks with fewer samples.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。