QUICK REVIEW

[Paper Review] Deriving information from missing data: implications for mood prediction

Yue Wu, Terry Lyons|arXiv (Cornell University)|Jun 26, 2020

Mental Health Research Topics18 references28 citations

TL;DR

This paper proposes a signature-based machine learning method that incorporates missing responses into longitudinal mood data analysis for improved diagnosis and mood prediction in bipolar disorder (BD), borderline personality disorder (BPD), and healthy controls (HC). By treating missing responses as informative events within a rough path framework, the method achieves 66% diagnostic accuracy and significantly outperforms naive models that exclude missing data, particularly for BPD classification and mood state prediction.

ABSTRACT

The availability of mobile technologies has enabled the efficient collection prospective longitudinal, ecologically valid self-reported mood data from psychiatric patients. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets and there is little consensus as to how this should be dealt with in practice. A signature-based method was used to capture different elements of self-reported mood alongside missing data to both classify diagnostic group and predict future mood in patients with bipolar disorder, borderline personality disorder and healthy controls. The missing-response-incorporated signature-based method achieves roughly 66\% correct diagnosis, with f1 scores for three different clinic groups 59\% (bipolar disorder), 75\% (healthy control) and 61\% (borderline personality disorder) respectively. This was significantly more efficient than the naive model which excluded missing data. Accuracies of predicting subsequent mood states and scores were also improved by inclusion of missing responses. The signature method provided an effective approach to the analysis of prospectively collected mood data where missing data was common and should be considered as an approach in other similar datasets.

Motivation & Objective

To address the challenge of non-random missing data in prospective self-reported mood datasets from psychiatric patients.
To evaluate whether incorporating missing responses as informative features enhances diagnostic classification and future mood prediction.
To develop and test a signature-based method that captures temporal dynamics and interactions between responses and missing data.
To compare the performance of the missing-response-incorporated model against standard approaches that exclude missing data.
To assess the method's utility in distinguishing between BD, BPD, and HC using ASRM and QIDS self-report data.

Proposed method

The signature method from rough path theory is applied to 2D concatenated mood data (ASRM and QIDS scores) with missing responses encoded as -1.
Missing responses are treated as events in a counting process, preserving temporal order and enabling the signature to capture patterns involving both responses and absences.
A level-2 signature feature extraction is used to encode the temporal dynamics of mood and missingness over time.
Random forest classifiers and regressors are used as base models for classification, state prediction, and score prediction tasks.
The method is compared against a naive baseline that excludes all missing data points from analysis.
The approach is validated on 126 participants from the AMoSS study with weekly mood assessments over at least 20 weeks.

Experimental results

Research questions

RQ1Can missing responses in longitudinal mood data be leveraged as informative features rather than discarded?
RQ2Does incorporating missing data into signature-based features improve diagnostic classification accuracy for BD, BPD, and HC?
RQ3How does the performance of mood state and score prediction compare between models that include versus exclude missing data?
RQ4Can the signature method effectively capture differences in mood instability patterns between BPD and BD patients?
RQ5Is the signature-based model more robust than standard imputation or exclusion-based approaches in the presence of non-random missingness?

Key findings

The missing-response-incorporated signature-based model achieved 66% overall diagnostic accuracy, significantly outperforming the naive model that excluded missing data.
F1 scores were 59% for BD, 75% for HC, and 61% for BPD, with BPD classification improving from below 0.5 in the naive model to above 0.6 with the new method.
The model reduced misclassification of BPD patients as BD patients from ~40% to less than one-third, indicating better capture of BPD-specific mood instability.
Mood state prediction accuracy improved for all groups when missing responses were included, especially for QIDS and ASRM state prediction.
Score prediction for future ASRM and QIDS scores was also enhanced by the inclusion of missing data in the signature features.
The method demonstrated robustness in handling non-random missingness, suggesting that missing data can carry meaningful information about underlying mood dynamics.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.