QUICK REVIEW

[Paper Review] AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge

Michel Valstar, Jonathan Gratch|arXiv (Cornell University)|May 5, 2016

Emotion and Mood Recognition57 references109 citations

TL;DR

AVEC 2016 presents guidelines, datasets, and baseline multimodal methods for depression severity estimation and affect recognition, with DCC and MASC sub-challenges and open baselines.

ABSTRACT

The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.

Motivation & Objective

Provide a common benchmark for multimodal depression and emotion analysis under controlled, reproducible conditions.
Compare audio, visual, and physiological modalities for depression severity estimation and affect recognition.
Promote multi-modal fusion to assess potential gains from combining modalities.
Release shared datasets (DAIC-WOZ, RECOLA) and baseline feature sets to foster reproducibility and comparability.

Proposed method

Define Depression Classification Sub-Challenge (DCC) and Multimodal Affect Recognition Sub-Challenge (MASC) with specific ground-truth labels and evaluation metrics.
Present the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) for depression severity labels via PHQ-8.
Present the RECOLA corpus for continuous arousal and valence annotations.
Provide baseline feature pipelines for video (OpenFace, FACET), audio (GeMAPS/eGeMAPS via openSMILE), and physiological signals (ECG, EDA, etc.).
Describe baseline models including linear SVM with SGD for classification/regression and random forest baselines, plus late-fusion schemes for multimodal predictions.

Experimental results

Research questions

RQ1What is the performance of baseline audio, video, and physiological features for depression severity estimation (PHQ-8) and mood/affect prediction (Arousal, Valence)?
RQ2How do unimodal baselines compare to multimodal fusion in depression and emotion recognition under AVEC 2016 rules?
RQ3To what extent do different modalities contribute to arousal and valence predictions in fusion models?
RQ4Can the provided baselines support reproducibility and fair comparison with top-performing approaches from previous AVEC challenges?

Key findings

Baseline AVEC 2016 achieves improvements over AVEC 2015 across most modalities for affect recognition, with audio excelling for arousal and video excelling for valence.
HRHRV-based physiology features outperform raw ECG for arousal prediction in the fusion setup.
Late fusion of audio, ECG, EDA, and video modalities yields higher CCC scores for arousal and valence than mono-modal results.
Video appearance and geometric features contribute differently across arousal and valence, illustrating complementary information in multimodal fusion.
Depression classification and severity estimation baselines (DCC) provide F1, precision, recall, RMSE, and MAE metrics under development/test partitions, enabling direct comparison under the AVEC 2016 protocol.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.