QUICK REVIEW

[论文解读] Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation

Amr M. Alexandari, Anshul Kundaje|arXiv (Cornell University)|Jan 21, 2019

Machine Learning and Data Classification参考文献 19被引用 32

一句话总结

本文提出一种混合方法，结合偏差校正校准与最大似然估计以实现标签偏移适应，证明其在多种数据集上优于最先进的方法（如 BBSL 和 RLLS）。该方法通过在使用 EM 算法进行标签偏移适应前，校正神经网络预测中的系统性误校准，从而实现更高的准确性，且由于似然函数的凹性，具有收敛性理论保证。

ABSTRACT

Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in settings like medical diagnosis, where a classifier trained to predict disease given symptoms must be adapted to scenarios where the baseline prevalence of the disease is different. Given estimates of p(y|x) from a predictive model, Saerens et al. proposed an efficient maximum likelihood algorithm to correct for label shift that does not require model retraining, but a limiting assumption of this algorithm is that p(y|x) is calibrated, which is not true of modern neural networks. Recently, Black Box Shift Learning (BBSL) and Regularized Learning under Label Shifts (RLLS) have emerged as state-of-the-art techniques to cope with label shift when a classifier does not output calibrated probabilities, but both methods require model retraining with importance weights and neither has been benchmarked against maximum likelihood. Here we (1) show that combining maximum likelihood with a type of calibration we call bias-corrected calibration outperforms both BBSL and RLLS across diverse datasets and distribution shifts, (2) prove that the maximum likelihood objective is concave, and (3) introduce a principled strategy for estimating source-domain priors that improves robustness to poor calibration. This work demonstrates that the maximum likelihood with appropriate calibration is a formidable and efficient baseline for label shift adaptation; notebooks reproducing experiments available at https://github.com/kundajelab/labelshiftexperiments

研究动机与目标

解决机器学习中的标签偏移问题，即训练集与测试集的类别先验分布发生变化，尤其在模型输出的概率校准性较差时。
评估将最大似然估计与改进的校准方法结合是否能优于现有的最先进方法（如 BBSL 和 RLLS）。
开发一种系统性方法，用于估计源域先验，以增强对校准预测中系统性偏差的鲁棒性。
证明在所提出的校准框架下，最大似然目标函数的凹性，从而确保全局收敛。

提出的方法

应用温度缩放的变体，引入类别特定的偏差参数，以校正模型预测中的系统性误校准。
将微调模型输出的校准后概率 $ p(y|\bm{x}) $ 输入最大似然估计框架，用于标签偏移适应。
实现一种期望最大化（EM）算法，以估计目标域的类别先验 $ q(y) $，基于假设 $ p(\bm{x}|y) = q(\bm{x}|y) $。
提出一种系统性策略，利用保留的验证集估计源域先验，以提升在严重校准偏差下的鲁棒性。
证明似然函数为凹函数且有界，从而可利用标准凸优化技术实现全局最大值的收敛。

实验结果

研究问题

RQ1将最大似然估计与偏差校正校准结合，是否能在标签偏移适应中优于现有最先进方法（如 BBSL 和 RLLS）？
RQ2在校准中使用类别特定的偏差校正，是否显著提升适应性能，相比标准温度缩放？
RQ3在所提出的校准框架下，最大似然目标函数是否为凹函数，从而确保全局收敛？
RQ4一种系统性估计源域先验的方法，是否能提升对校准概率中系统性偏差的鲁棒性？

主要发现

所提出的最大似然结合偏差校正校准方法，在 MNIST、CIFAR10/CIFAR100 和糖尿病视网膜病变检测数据集上，面对各种分布偏移，始终优于 BBSL 和 RLLS。
标准温度缩放在标签偏移适应中无法获得最优结果，因为校准后的概率中仍存在持续的系统性偏差。
校准过程中引入类别特定的偏差校正，相比标准校准技术，能显著提升适应性能。
所提出的框架下，最大似然目标函数被证明为凹函数且有界，从而保证收敛至全局最大值。
该方法在无需模型微调或超参数调优的情况下，达到最先进性能，而 BBSL 和 RLLS 则需额外训练或调参。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。