QUICK REVIEW

[Paper Review] Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study

Dimitrios Kollias, Viktoriia Sharmanska|arXiv (Cornell University)|May 8, 2021

Emotion and Mood Recognition61 references99 citations

TL;DR

The paper introduces FaceBehaviorNet, a holistic, heterogeneous multi-task learning framework for large-scale face analysis that uses distribution matching and co-annotation to jointly learn facial emotions, action units, valence-arousal, as well as face identity and attributes across 10 in-the-wild databases, reducing negative transfer.

ABSTRACT

Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm, such as a DNN. MTL is based on the assumption that the tasks under consideration are related; therefore it exploits shared knowledge for improving performance on each individual task. Tasks are generally considered to be homogeneous, i.e., to refer to the same type of problem. Moreover, MTL is usually based on ground truth annotations with full, or partial overlap across tasks. In this work, we deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems. We explore task-relatedness as a means for co-training, in a weakly-supervised way, tasks that contain little, or even non-overlapping annotations. Task-relatedness is introduced in MTL, either explicitly through prior expert knowledge, or through data-driven studies. We propose a novel distribution matching approach, in which knowledge exchange is enabled between tasks, via matching of their predictions' distributions. Based on this approach, we build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks. We develop case studies for: i) continuous affect estimation, action unit detection, basic emotion recognition; ii) attribute detection, face identification. We illustrate that co-training via task relatedness alleviates negative transfer. Since FaceBehaviorNet learns features that encapsulate all aspects of facial behavior, we conduct zero-/few-shot learning to perform tasks beyond the ones that it has been trained for, such as compound emotion recognition. By conducting a very large experimental study, utilizing 10 databases, we illustrate that our approach outperforms, by large margins, the state-of-the-art in all tasks and in all databases, even in these which have not been used in its training.

Motivation & Objective

Motivate and address heterogeneous multi-task learning for facial behavior analysis across detection, classification, and regression tasks.
Develop a distribution matching-based coupling mechanism to enable knowledge exchange between tasks with incomplete or non-overlapping annotations.
Propose co-annotation and distribution matching losses to alleviate negative transfer.
Create FaceBehaviorNet as a first holistic framework for large-scale face analysis.
Demonstrate strong cross-database performance and zero-shot/few-shot generalization.

Proposed method

Formulate heterogenous multi-task learning with tasks T_i and distributions D_i, aiming to minimize average expected loss across tasks.
Introduce task Relatedness (domain knowledge or empirical from datasets) to couple tasks during training.
Propose co-annotation to constrain related task labels when annotations are available across tasks.
Propose distribution matching (distillation-like) loss L_DM to align task predictions via a mixture distribution q(y_au|x) over emotions.
Define soft co-annotation variants and soft targets (L_SCA) to strengthen coupling when annotations are incomplete.
Extend the approach to a second case study combining identities and 40 attributes via distribution matching.
Demonstrate zero- and few-shot compound expression recognition by leveraging learned face behavior features.

Experimental results

Research questions

RQ1How can heterogeneous tasks (classification, detection, regression) be jointly learned to improve performance across domains of facial analysis?
RQ2Can task relatedness be effectively encoded via domain knowledge or empirical dataset annotations to enable knowledge transfer?
RQ3Does distribution matching-based coupling mitigate negative transfer in multi-task learning for face analysis?
RQ4How well does a single holistic model perform across affective computing and face recognition tasks on large wild datasets?
RQ5Can the learned features support zero-shot and few-shot recognition of compound expressions?

Key findings

FaceBehaviorNet outperforms single-task networks across all tasks and across all 10 databases studied.
Distribution matching-based knowledge distillation across heterogeneous tasks successfully reduces negative transfer.
The framework supports zero-shot and few-shot learning for compound emotion recognition using the learned holistic representations.
Task coupling via co-annotation and/or distribution matching improves performance even on databases not seen during training.
The method yields state-of-the-art results across affective computing tasks (emotions, AUs, valence-arousal) and face recognition attributes (identities, attributes).

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.