QUICK REVIEW

[論文レビュー] Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning

Guangyuan Shi, Qimai Li|arXiv (Cornell University)|Feb 22, 2023

Domain Adaptation and Few-Shot Learning被引用数 11

ひとこと要約

Reconは共有ネットワーク層におけるレイヤーごとの勾配の衝突を特定し、最も衝突の大きい層をタスク特異的なものに変換し、スクラッチから再訓練して、衝突を大幅に低減しマルチタスク性能を向上させる。

ABSTRACT

A fundamental challenge for multi-task learning is that different tasks may conflict with each other when they are solved jointly, and a cause of this phenomenon is conflicting gradients during optimization. Recent works attempt to mitigate the influence of conflicting gradients by directly altering the gradients based on some criteria. However, our empirical study shows that ``gradient surgery'' cannot effectively reduce the occurrence of conflicting gradients. In this paper, we take a different approach to reduce conflicting gradients from the root. In essence, we investigate the task gradients w.r.t. each shared network layer, select the layers with high conflict scores, and turn them to task-specific layers. Our experiments show that such a simple approach can greatly reduce the occurrence of conflicting gradients in the remaining shared layers and achieve better performance, with only a slight increase in model parameters in many cases. Our approach can be easily applied to improve various state-of-the-art methods including gradient manipulation methods and branched architecture search methods. Given a network architecture (e.g., ResNet18), it only needs to search for the conflict layers once, and the network can be modified to be used with different methods on the same or even different datasets to gain performance improvement. The source code is available at https://github.com/moukamisama/Recon.

研究の動機と目的

マルチタスク学習（MTL）における矛盾した勾配によるネガティブ・トランスファーを動機づけて対処する。
共有が最も問題を引き起こす箇所を特定するためのレイヤーごとの衝突指標を導入する。
最も衝突の大きい共有層をタスク特異的なものへ変換して再訓練する、単純でスケーラブルな手法を提案する。
ルート層の衝突を取り除くことが、多様なデータセットとアーキテクチャにわたって改善をもたらすことを示す。

提案手法

初期訓練中に各共有層に対するタスク勾配を計算する。
反復を通じて cos(phi) < S を満たすタスク対の勾配角度の数に基づいてレイヤーごとの衝突スコア s(k) を定義する。
上位K個の衝突が最も大きい層を特定し、それらをタスク特異的なパラメータに変換してネットワークをスクラッチから再訓練する。
修正パラメータ集合に対する Recon のワンステップ更新後に損失が低減することを示す理論分析を提供する。
複数のデータセットとバックボーンにわたり Recon を実験的に検証し、勾配操作および分岐型アーキテクチャ手法と互換性があることを示す。

実験結果

リサーチクエスチョン

RQ1勾配衝突は、勾配操作だけでなく、根本的な共有を構造的に削減することで緩和できるか。
RQ2どの共有層が勾配衝突に最も寄与しており、それらをタスク特異的にすることで過度なパラメータ増加なしに性能を向上させることができるか。
RQ3Recon はデータセットとアーキテクチャに関係なく、最先端の勾配操作およびアーキテクチャ探索手法を一貫して改善するか。
RQ4異なるタスクやデータセットで利益を生むには、衝突層の単一の探索で十分か。

主な発見

Recon は残りの共有層における矛盾する勾配を大幅に低減し、しばしば共有パラメータの 13%未満をタスク特異的に変換するだけで済む。
Recon は Joint-training およびほとんどの勾配操作のベースラインを大幅に上回る性能向上を、いくつかのデータセット（Multi-Fashion+MNIST、CelebA、CityScapes、PASCAL-Context、NYUv2）で達成する。
Recon は、モデルサイズを控えめに保ち、既存手法（MGDA、PCGrad、GradDrop、CAGrad、RotoGrad、BMTAS のような分岐型アーキテクチャ探索）と互換性を保ちながら、最良のベースラインを改善または同等にする。
Recon が特定した衝突層は、特定のアーキテクチャに対して、トレーニング段階・手法・データセットを通じてほぼ一貫しており、広い適用性のための一度の衝突層探索を可能にする。
アブレーション研究は、層やパラメータをランダムに選択する方法が Recon より劣ることを示しており、最も深刻なレイヤーごとの衝突をターゲットにする重要性を浮き彫りにしている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。