QUICK REVIEW

[论文解读] Capacity Constraints Make Admissions Processes Less Predictable

Evan Dong, Nikhil Garg|arXiv (Cornell University)|Jan 16, 2026

Online Learning and Analytics被引用 0

一句话总结

该论文认为招生容量约束造成 cohort 依赖性，标准机器学习模型难以预测，导致在申请人池变化时决策不稳定、可变性增加，从而预测性能下降。

ABSTRACT

Machine learning models are often used to make predictions about admissions process outcomes, such as for colleges or jobs. However, such decision processes differ substantially from the conventional machine learning paradigm. Because admissions decisions are capacity-constrained, whether a student is admitted depends on the other applicants who apply. We show how this dependence affects predictive performance even in otherwise ideal settings. Theoretically, we introduce two concepts that characterize the relationship between admission function properties, machine learning representation, and generalization to applicant pool distribution shifts: instability, which measures how many existing decisions can change when a single new applicant is introduced; and variability, which measures the number of unique students whose decisions can change. Empirically, we illustrate our theory on individual-level admissions data from the New York City high school matching system, showing that machine learning performance degrades as the applicant pool increasingly differs from the training data. Furthermore, there are larger performance drops for schools using decision rules that are more unstable and variable. Our work raises questions about the reliability of predicting individual admissions probabilities.

研究动机与目标

说明招生容量约束如何产生 cohort 依赖的决策，挑战标准 ML 假设。
界定并形式化影响 pool shift 下泛化能力的招生函数特性（不稳定性与可变性）。
将选择函数理论与机器学习表示桥接，评估在何时 ML 能真实地表示招生过程。
利用纽约市公立高中匹配数据和合成仿真，实证验证理论并展示 pool 变化下的预测挑战。

提出的方法

将招生建模为具有容量约束的选择函数（q-acceptant）。
将不稳定性定义为在新增申请人时，现有决策中可以改变的最大数量。
将可变性定义为单个新申请人可能使不同被录取的学生被取代的最大数量。
将 ML 表征与选择函数属性关联，表明独立预测模型无法表示不稳定/可变的函数。
表征队列的序列组合（每个队列具有全序关系）在何时能够捕捉可接受的预测表示。
应用于 NYC 高中匹配数据，使用仿真器在不同池和选择函数下生成反事实招生。

实验结果

研究问题

RQ1容量约束如何在招生决策中引发 cohort 依赖？
RQ2招生选择函数的不稳定性与可变性特性及其与 ML 表现力之间的关系？
RQ3在池分布变化下，ML 对招生结果的预测性能如何下降？
RQ4不同 NYC 项目类型（教育 Opt、筛选/开放、是否含 DIA）在不稳定性与可变性上的差异，以及对可预测性的影响？

主要发现

容量受限的招生产生 cohort 依赖的决策，标准独立预测的 ML 模型无法真实表示。
不稳定性和可变性量化了随池变化而改变的决策数量，以及新申请人可能使之 displaced 的不同被录取学生数量。
在 NYC 数据中，作者在不同池组成下模拟招生，发现随着申请池的时间变化，ML 预测正确率下降。
不同项目类型显示出不同的不稳定性与可变性，多队列（序列）结构的可变性高于单队列。
所有分析的 NYC 函数均为 1-不稳定，且可变性等于队列数量（筛选/开放为 1；有 DIA 的 Ed. Opt 可变性最高至 6）。
以独立预测每个申请结果的 ML 模型表示，无法捕捉 0-不稳定到 1-不稳定的函数，在更高的不稳定性/可变性下也会失败。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。