QUICK REVIEW

[Paper Review] Understanding and Improving Information Transfer in Multi-Task Learning

Sen Wu, Hongyang Zhang|arXiv (Cornell University)|May 2, 2020

Domain Adaptation and Few-Shot Learning51 references47 citations

TL;DR

The paper analyzes a shared-feature multi-task learning (MTL) architecture with task-specific heads, showing data alignment between tasks critically affects transfer, and proposes covariance-alignment and SVD-based weighting to improve MTL and transfer performance.

ABSTRACT

We investigate multi-task learning approaches that use a shared feature representation for all tasks. To better understand the transfer of task information, we study an architecture with a shared module for all tasks and a separate output module for each task. We study the theory of this setting on linear and ReLU-activated models. Our key observation is that whether or not tasks' data are well-aligned can significantly affect the performance of multi-task learning. We show that misalignment between task data can cause negative transfer (or hurt performance) and provide sufficient conditions for positive transfer. Inspired by the theoretical insights, we show that aligning tasks' embedding layers leads to performance gains for multi-task training and transfer learning on the GLUE benchmark and sentiment analysis tasks; for example, we obtain a 2.35% GLUE score average improvement on 5 GLUE tasks over BERT-LARGE using our alignment method. We also design an SVD-based task reweighting scheme and show that it improves the robustness of multi-task training on a multi-label image dataset.

Motivation & Objective

Understand when multi-task learning with a shared representation helps or harms individual tasks.
Characterize how model capacity, task covariance, and optimization influence transfer between tasks.
Develop practical methods to improve MTL effectiveness and robustness under data alignment considerations.
Provide theoretical conditions for positive transfer and practical algorithms for alignment and reweighting.

Proposed method

Study a shared B (shared module) plus per-task A_i (output modules) architecture with a loss sum_i L(g(X_i B) A_i, y_i).
Introduce a per-task weighting scheme alpha_i in the loss to account for varying data sizes.
Develop theory in linear and ReLU-activated models focusing on three components: shared capacity (r), task covariances (X_i^T X_i), and per-task weights (alpha_i).
Define task covariance and covariance similarity score to quantify alignment between tasks.
Propose Algorithm 1 covariance alignment which introduces alignment matrices R_i pre-embedding to align task covariances during training.
Propose Algorithm 2 an SVD-based task reweighting scheme to improve robustness especially under label noise.

Experimental results

Research questions

RQ1When does multi-task learning with a shared representation yield positive versus negative transfer between tasks?
RQ2How do shared capacity, task covariance, and per-task weights influence transfer performance in linear and ReLU settings?
RQ3Can we design practical methods to align task embeddings and reweight tasks to improve MTl and transfer robustness?
RQ4Do alignment and reweighting techniques translate to improvements on benchmarks like GLUE and sentiment analysis datasets?
RQ5How robust are these methods under label noise and in transfer learning scenarios?

Key findings

Aligning the covariances of task embedding layers yields performance gains on GLUE (2.35% average GLUE score over 5 tasks with BERT_LARGE).
Covariance alignment also improves transfer learning in sentiment analysis tasks by up to 2.5% accuracy.
An SVD-based task reweighting scheme improves multi-task training robustness on the ChestX-ray14 multi-label dataset by 0.4% AUC on average.
The shared module capacity should be smaller than the sum of standalone task capacities to enable transfer; too-large capacity yields no transfer.
The theory provides sufficient conditions for positive transfer depending on task covariances and sample size, and a metric for covariance similarity is proposed.
Empirical ablations show covariance alignment boosts performance across CNN/MLP and LSTM baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.