Skip to main content
QUICK REVIEW

[论文解读] Adjusted Plus-Minus for NHL Players using Ridge Regression

Brian Macdonald|arXiv (Cornell University)|Jan 1, 2012
Advanced Statistical Methods and Models参考文献 8被引用 3
一句话总结

本文提出了一种基于岭回归的调整后正负值模型,用于评估NHL球员在人数相等、多打少和少防多情况下的个人攻防贡献,且独立于队友、对手和区域开局。通过整合基于射门的指标(如Fenwick和Corsi),相较于仅依赖进球的数据,提供了更多数据点,从而降低估计误差,提高传统OLS方法的精度。

ABSTRACT

Regression-based adjusted plus-minus statistics were developed in basketball and have recently come to hockey. The upside to these methods is that they provide an estimate of each player’s contribution to his team, independent of the strength of his teammates, the strength of his opponents, and other variables that are out of his control. One of the main downsides of the ordinary least squares regression models is that the estimates have large error bounds. Since certain pairs of teammates play together frequently, collinearity is present in the data and is one reason for the large errors. In hockey, the relative lack of scoring compared to basketball is another reason. To deal with these issues, we use ridge regression, a method that is commonly used when collinearity is present in the data, in lieu of ordinary least squares regression. We also create models that use not only goals, but also shots, Fenwick rating (shots plus missed shots), and Corsi rating (shots, missed shots, and blocked shots). One benefit of using these statistics is that there are roughly ten times as many shots as goals, so there is much more data when using these statistics and the resulting estimates have smaller error bounds. The results of our ridge regression models are estimates of the offensive and defensive contributions of forwards and defensemen during even strength, power play, and short handed situations, in terms of goals per 60 minutes. The estimates are independent of strength of teammates, strength of opponents, and the zone in which a player’s shift begins.

研究动机与目标

  • 解决由于共线性和低得分频率导致的NHL球员调整后正负值普通最小二乘法(OLS)估计中高方差的问题。
  • 通过利用岭回归处理频繁的队友配对带来的多重共线性问题,提升估计精度。
  • 通过引入Fenwick和Corsi等基于射门的指标,将球员评估从进球扩展至更多数据点,降低标准误。
  • 为人数相等、多打少和少防多情况分别建立模型,以捕捉情境相关的贡献差异。
  • 在多种比赛状态下,提供无偏的、与队友和对手无关的每60分钟进球数形式的球员贡献估计。

提出的方法

  • 采用岭回归而非普通最小二乘法(OLS),以缓解因球员配对频繁出现导致的多重共线性问题。
  • 使用基于射门的统计量——Fenwick(射门+未命中射门)和Corsi(射门+未命中射门+被阻挡射门)——以增加数据量并降低估计方差。
  • 为人数相等、多打少和少防多情况分别构建回归模型,以反映不同的战术和战略背景。
  • 将球员贡献估计为每60分钟预期进球数,同时调整球队实力、对手实力和区域开局位置的影响。
  • 引入正则化以压缩极端估计值,提升稳定性,尤其适用于出场时间有限或稀疏的球员。
  • 采用惩罚似然法,其中岭惩罚项控制过拟合,并降低回归系数估计的标准误。

实验结果

研究问题

  • RQ1在进球频率较低且存在队友共线性的情况下,与OLS相比,岭回归是否能有效降低NHL球员调整后正负值的估计误差?
  • RQ2与基于进球的模型相比,Fenwick和Corsi等基于射门的指标在多大程度上提升了球员贡献估计的精度?
  • RQ3在控制队友、对手和区域开局位置后,球员估计值在多大程度上保持稳定和可靠?
  • RQ4使用新模型时,进攻和防守贡献在人数相等、多打少和少防多情况下的差异如何?
  • RQ5引入基于射门的统计量是否使球员排名更加一致和可靠,相较于传统的正负值或基于OLS的模型?

主要发现

  • 岭回归通过解决频繁队友配对带来的多重共线性问题,显著降低了球员调整后正负值估计的标准误。
  • 引入Fenwick和Corsi指标后,数据点数量相比基于进球的模型增加了约十倍,从而实现更精确的估计。
  • 该模型在人数相等、多打少和少防多的所有情境下,均能产生稳定、与队友和对手无关的每60分钟进球数形式的球员贡献估计。
  • 由于岭回归的正则化效应,出场时间有限或稀疏的球员获得了更可靠且不极端的估计值。
  • 使用基于射门的指标使球员排名更加一致,且更准确地区分球员技能水平,优于传统的正负值或基于OLS的模型。
  • 通过控制球队实力、对手质量及区域开局等外部因素,该模型成功分离出个体球员的贡献,实现了更公平的绩效评估。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。