[Paper Review] Flexibly Fair Representation Learning by Disentanglement
The paper introduces FFVAE, a variational autoencoder that learns disentangled, predictive latent factors for multiple sensitive attributes and their conjunctions, enabling test-time, attribute-agnostic fairness adaptations without needing sensitive attributes at inference. It demonstrates improved fair classification performance across synthetic and real datasets by flexibly removing or noising sensitive latent dimensions.
We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder---which does not require the sensitive attributes for inference---enables the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions.
Motivation & Objective
- Motivate fair representation learning capable of handling multiple sensitive attributes and their intersections.
- Learn compact latent representations that are predictive of sensitive attributes yet disentangled from non-sensitive factors.
- Enable easy, compositional test-time modifications to achieve subgroup demographic parity across tasks and labels.
- Provide a VAE-based method that uses sensitive attributes to structure the latent space and allows test-time fairness adjustments.
Proposed method
- Extend VAE with a disentangled, multilayer latent space separating non-sensitive z and sensitive b dimensions.
- Use a factorized decoder p(x|z,b) and p(a|b) to model non-sensitive reconstruction and sensitive attribute predictions.
- Impose disentanglement via a total correlation penalty and a predictor penalty linking b to a, controlled by hyperparameters alpha and gamma.
- Train with q(z|x) and q(b|x) where b is treated as non-stochastic for stability.
- Allow at test time to remove or noise out selected b dimensions, yielding a fair representation [z,b\u2032] independent of sensitive groups.
- Fairness objective L_FFVAE combines reconstruction, predictiveness, disentanglement, and prior matching terms; gamma weights the total correlation adversary; alpha weights predictiveness.
Experimental results
Research questions
- RQ1Can a single learned representation be flexibly fair with respect to multiple sensitive attributes and their conjunctions at test time?
- RQ2Does disentangling sensitive information in the latent space enable easy, compositional adjustments to achieve demographic parity across various subgroup definitions?
- RQ3How does FFVAE perform on synthetic and real datasets in terms of accuracy and fairness metrics compared to existing disentanglement baselines?
- RQ4Does the model maintain predictive utility for downstream tasks while removing sensitive information at test time?
Key findings
- FFVAE enables test-time fair adaptation without requiring sensitive attributes at inference.
- Increasing the predictiveness weight alpha improves both disentanglement and the ability to align latent factors with corresponding sensitive attributes.
- FFVAE achieves better fairness-accuracy tradeoffs than baselines on the DSpritesUnfair synthetic dataset, especially for conjunctions of attributes.
- On Communities & Crime and Celeb-A, FFVAE attains competitive fairness-accuracy performance and demonstrates robustness across multiple subgroup definitions.
- The method remains effective even when sensitive attributes are correlated with labels, a challenging real-data scenario.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.