[Paper Review] Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
The paper proposes a two-dataset, proxy-Lagrangian optimization approach to train classifiers with data-dependent constraints (e.g., fairness metrics) that generalize better to unseen data. It provides theoretical generalization bounds and practical algorithms, validated by experiments.
Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals. We study the generalization performance for such constrained optimization problems, in terms of how well the constraints are satisfied at evaluation time, given that they are satisfied at training time. To improve generalization performance, we frame the problem as a two-player game where one player optimizes the model parameters on a training dataset, and the other player enforces the constraints on an independent validation dataset. We build on recent work in two-player constrained optimization to show that if one uses this two-dataset approach, then constraint generalization can be significantly improved. As we illustrate experimentally, this approach works not only in theory, but also in practice.
Motivation & Objective
- Motivate training classifiers with data-dependent constraints such as fairness or policy-driven goals.
- Develop a two-dataset framework to improve constraint generalization independent of model complexity.
- Provide theoretical bounds on optimality, feasibility, and constraint generalization under the two-dataset setup.
- Introduce algorithms (oracle-based and gradient-based) with convergence and generalization guarantees.
- Demonstrate empirically that separate training and validation datasets improve constraint generalization in practice.
Proposed method
- Model the problem as a two-player game where one player optimizes model parameters on training data and the other enforces constraints on a validation set.
- Use a proxy-Lagrangian formulation with surrogate constraint losses for the theta-player and original constraint losses for the lambda-player.
- Employ a two-dataset approach: S(train) for theta optimization and S(val) for lambda optimization to separate evidence used for learning and constraint enforcement.
- Provide an oracle-based algorithm with discretization and a Bayesian optimization oracle to find near-equilibrium solutions.
- Provide a gradient-based algorithm under strong convexity assumptions that avoids discretization and remains practical.
- Derive generalization bounds showing constraint generalization depends on the validation set rather than model complexity.
Experimental results
Research questions
- RQ1How well do data-dependent constraints generalize from training to evaluation time when learned within a constrained optimization framework?
- RQ2Can separating the training and constraint enforcement evidence via two independent datasets improve constraint generalization independent of model complexity?
- RQ3What are the theoretical optimality, feasibility, and generalization guarantees for two-dataset proxy-Lagrangian approaches?
- RQ4How do oracle-based and gradient-based algorithms perform in achieving near-optimality, near-feasibility, and good constraint generalization?
- RQ5Do empirical results corroborate that two-dataset approaches improve constraint generalization for fairness and other data-dependent constraints?
Key findings
- A two-dataset proxy-Lagrangian framework can significantly improve constraint generalization compared to one-dataset approaches.
- For discretized oracle-based methods, there are provable bounds on optimality and feasibility tied to training and validation generalization errors.
- Under a gradient-based algorithm with strong convexity, similar near-optimality and near-feasibility guarantees hold while remaining practically implementable.
- Constraint generalization on the validation set can be bounded independently of model complexity, contrasting with traditional bounds tied to the function class complexity.
- Experimental results indicate the two-dataset approach improves constraint generalization in practice, beyond what theory alone guarantees.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.