QUICK REVIEW

[论文解读] Evaluating Point Forecasts

Tilmann Gneiting|arXiv (Cornell University)|Dec 4, 2009

Forecasting Techniques and Applications被引用 4

一句话总结

本文表明，除非评分函数与预测任务正确对齐，否则常见的点预测评估方法可能产生误导性结果。本文引入了‘一致性’和‘可预测性’的概念，表明当评分函数在事前指定时，或当评分函数对某一统计功能（如均值或分位数）具有一致性时，最优预测即为贝叶斯规则。

ABSTRACT

Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, such as the absolute error or the squared error. The individual scores are then averaged over forecast cases, to result in a summary measure of the predictive performance, such as the mean absolute error or the (root) mean squared error. I demonstrate that this common practice can lead to grossly misguided inferences, unless the scoring function and the forecasting task are carefully matched. Effective point forecasting requires that the scoring function be specified ex ante, or that the forecaster receives a directive in the form of a statistical functional, such as the mean or a quantile of the predictive distribution. If the scoring function is specified ex ante, the forecaster can issue the optimal point forecast, namely, the Bayes rule. If the forecaster receives a directive in the form of a functional, it is critical that the scoring function be consistent for it, in the sense that the expected score is minimized when following the directive. A functional is elicitable if there exists a scoring function that is strictly consistent for it. Expectations, ratios of expectations and quantiles are elicitable. For example, a scoring function is consistent for the mean functional if and only if it is a Bregman function. It is consistent for a quantile if and only if it is generalized piecewise linear. Similar characterizations apply to ratios of expectations and to expectiles. Weighted scoring functions are consistent for functionals that adapt to the weighting in peculiar ways. Not all functionals are elicitable; for instance, conditional value-at-risk is not, despite its popularity in quantitative finance.

研究动机与目标

识别使用均方误差等任意评分函数进行标准点预测评估中的缺陷。
确立预测准确性在很大程度上取决于将评分函数与预测任务或统计功能相匹配。
形式化评分函数对给定统计功能（如均值或分位数）具有一致性的条件。
澄清哪些功能是可预测的（即存在严格一致评分函数的功能），哪些不是（如条件风险价值），并解释其特征。
为实践中选择合适的评分函数提供理论基础，以确保最优且可靠的预测。

提出的方法

将评分函数定义为对某一统计功能具有一致性，当且仅当当预测与该功能匹配时，其期望评分最小化。
刻画一致评分函数：均值对应Bregman函数，分位数对应广义分段线性函数。
引入可预测功能的概念，即存在严格一致评分函数的功能。
分析加权评分函数及其与适应加权结构的功能之间的行为关系。
通过数学刻画表明，并非所有功能都可预测，例如条件风险价值即不可预测。
确立当评分函数在事前指定时，最优点预测即为贝叶斯规则，从而确保最小期望损失。

实验结果

研究问题

RQ1为何使用常见评分函数（如平方误差）进行标准预测评估可能导致误导性推断？
RQ2评分函数必须满足何种条件，才能对给定统计功能（如均值或分位数）具有一致性？
RQ3哪些统计功能是可预测的，其一致评分函数具有何种特征？
RQ4为何条件风险价值尽管在金融领域广泛应用，却不可预测？
RQ5加权评分函数如何与功能相互作用，这对预测优化有何影响？

主要发现

为确保可靠评估和最优点预测，评分函数必须与所预测的功能具有一致性。
均值功能是可预测的，其一致评分函数可表征为Bregman函数。
分位数是可预测的，其一致评分函数为广义分段线性函数。
期望比和期望分位数也是可预测的，其一致评分函数具有相应的刻画形式。
条件风险价值不可预测，意味着不存在能一致地诱发它的评分函数，这削弱了其在预测评估中的应用。
当评分函数在事前指定时，最优点预测即为贝叶斯规则，从而确保最小期望损失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。