Skip to main content
QUICK REVIEW

[论文解读] Approximate full conformal prediction in an RKHS

Davidson Lova Razafindrakoto, Alain Célisse|arXiv (Cornell University)|Jan 19, 2026
Statistical Methods and Inference被引用 0
一句话总结

论文提出了一种在 RKHS 设置中对完整一致预测区域的通用、可计算近似,具备非渐近保证以及用于量化近似误差的新厚度度量。

ABSTRACT

Full conformal prediction is a framework that implicitly formulates distribution-free confidence prediction regions for a wide range of estimators. However, a classical limitation of the full conformal framework is the computation of the confidence prediction regions, which is usually impossible since it requires training infinitely many estimators (for real-valued prediction for instance). The main purpose of the present work is to describe a generic strategy for designing a tight approximation to the full conformal prediction region that can be efficiently computed. Along with this approximate confidence region, a theoretical quantification of the tightness of this approximation is developed, depending on the smoothness assumptions on the loss and score functions. The new notion of thickness is introduced for quantifying the discrepancy between the approximate confidence region and the full conformal one.

研究动机与目标

  • 推动并解决完整一致性预测的计算不可行性问题。
  • 提出一个通用方案,在不进行数据划分的情况下近似完整一致性预测区域。
  • 建立一个理论将近似质量与损失和分数函数的光滑性联系起来。
  • 在带有岭回归样 predictor 的 RKHS 中实例化该方案,使用鲁棒损失。
  • 提供非渐近保证和用于近似质量的厚度度量。

提出的方法

  • 在 RKHS 中定义一个岭状预测器,具有灵活的损失 $\
Figure 1: Evolution of the upper bound in Equation ( 15 ) (dashed red line) and the quantity $\Delta^{(0)}$ (solid blue line) as a function of the sample size $n$ in $\log\log$ scale (to appreciate the rate). The data is sampled from $\mathrm{sklearn}$ synthetic data set make_friedman1(sample_size=n
Figure 1: Evolution of the upper bound in Equation ( 15 ) (dashed red line) and the quantity $\Delta^{(0)}$ (solid blue line) as a function of the sample size $n$ in $\log\log$ scale (to appreciate the rate). The data is sampled from $\mathrm{sklearn}$ synthetic data set make_friedman1(sample_size=n

实验结果

研究问题

  • RQ1我们是否可以设计一个可计算的近似来保持覆盖率的完整一致性预测区域?
  • RQ2如何量化近似相对于精确的完整一致性区域的紧密度?
  • RQ3损失和分数函数的光滑性属性在近似误差中起什么作用?
  • RQ4在非光滑与光滑损失下,所提近似的表现如何?
  • RQ5受影响函数启发的近似是否能收紧区域并给出明确界限?

主要发现

  • 引入一种通用近似框架,产生可计算的同分布 p 值和置信区域。
  • 证明近似区域包含完整区域并在水平 1−α 处保持覆盖。
  • 定义厚度以量化近似区域与完整区域之间的对称差异的体积。
  • 给出依赖于损失/分数光滑性和核属性的非渐近近似误差界限。
  • 展示对非光滑损失的具体界限,恢复先前的稳定型 conformal 类结果。
  • 在有界核的有利条件下,将厚度的收敛速度与 O(1/n) 联系起来。
Figure 3: Evolution of the upper bound in Equation ( 27 ) (dashed red line) and the quantity $\Delta^{(2)}$ (solid blue line) as a function of the sample size $n$ in $\log\log$ scale (to appreciate the rate). The data is sampled from $\mathrm{sklearn}$ synthetic data set make_friedman1(sample_size=n
Figure 3: Evolution of the upper bound in Equation ( 27 ) (dashed red line) and the quantity $\Delta^{(2)}$ (solid blue line) as a function of the sample size $n$ in $\log\log$ scale (to appreciate the rate). The data is sampled from $\mathrm{sklearn}$ synthetic data set make_friedman1(sample_size=n

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。