QUICK REVIEW

[论文解读] Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting

Benjamin Redhead, Thomas L. Lee|arXiv (Cornell University)|Feb 10, 2026

Forecasting Techniques and Applications被引用 0

一句话总结

论文引入 Sig-MMD 与 CSig-MMD，基于核的指标，利用签名核评估多变量、多步的概率预测并强调尾部事件，保持正确性。

ABSTRACT

Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume independence across time steps or variables, and they demonstrably lack sensitivity to tail events, the very occurrences that are most pivotal in real-world decision-making. To address these limitations, we propose two kernel-based metrics: the signature maximum mean discrepancy (Sig-MMD) and our novel censored Sig-MMD (CSig-MMD). By leveraging the signature kernel, these metrics capture complex inter-variate and inter-temporal dependencies and remain robust to missing data. Furthermore, CSig-MMD introduces a censoring scheme that prioritizes a forecaster's capability to predict tail events while strictly maintaining properness, a vital property for a good scoring rule. These metrics enable a more reliable evaluation of direct multi-step forecasting, facilitating the development of more robust probabilistic algorithms.

研究动机与目标

解决对概率时间序列预测缺乏共识性评估指标的问题。
在不假设独立性的前提下，捕捉变量间与时间间的依赖性。
在保持正确评分属性的同时提升对尾部事件的敏感性。
提供一个盲留尾部性能的审慎度量，以尾部表现为重点而不牺牲正确性。
在合成与真实世界时间序列数据集上验证有效性。

提出的方法

利用签名核将时间序列映射到高维特征空间，并应用最大平均差异（MMD）比较预测分布与真实分布（Sig-MMD）。
在应用签名核之前，向序列中加入时间、基点和端点信息，以保持时间几何。
引入 CSig-MMD，一种 Sig-MMD 的截尾版本，通过基于马氏距离的审查机制和软逻辑权重，将尾部区域之外的质量质量集中回一个枢轴点。
证明在具有特征签名核与恰当截尾的情况下，CSig-MMD 仍然严格正确。
在合成高斯过程与真实世界数据集上进行实验，将 Sig-MMD 与 CSig-MMD 与标准指标（QL、CRPS、ES、VS）进行对比。
在多种预测模型（包括基础模型）上进行评估，展示尾部聚焦指标如何给出与主体聚焦指标不同的排序。

Figure 1 : Comparison of forecast samples on Tail (Top, ERA5) and Body (Bottom, EWELD) scenarios. Top: Chronos-2, which receives the lowest score from Sig-MMD, QL, and VS, fails to predict the initial extreme spike, predicting phantom spikes later instead. In contrast, Moirai, which scores lowest on

实验结果

研究问题

RQ1基于签名核的指标（Sig-MMD）是否能够在不假设独立性的前提下，捕捉多变量与时间依赖性？
RQ2截尾变体（CSig-MMD）是否在保持模型评估属性的同时，为尾部事件预测提供严格正确的评分规则？
RQ3Sig-MMD 与 CSig-MMD 与标准指标相比，在识别依赖结构和尾部性能方面对合成与真实世界时间序列 forecast 的表现如何？
RQ4这些指标是否揭示传统预测与基础模型之间在尾部事件上的差异，尤其对尾部事件的评估？

主要发现

Model	QL	CRPS	ES	VS	Sig	CSig
DLinear	2 , 0 , 7	1 , 0 , 8	2 , 1 , 6	2 , 0 , 7	2 , 0 , 7	3 , 0 , 6
NLinear	1 , 1 , 7	2 , 0 , 7	3 , 1 , 5	0 , 2 , 7	1 , 2 , 6	2 , 0 , 7
PatchTST	2 , 1 , 6	2 , 1 , 6	2 , 1 , 6	4 , 0 , 5	3 , 1 , 5	2 , 0 , 7
iTransformer	1 , 0 , 8	0 , 0 , 9	0 , 1 , 8	0 , 1 , 8	0 , 0 , 9	1 , 1 , 7
TimesNet	0 , 1 , 8	1 , 0 , 8	1 , 0 , 8	1 , 0 , 8	1 , 1 , 7	0 , 1 , 8
N-HiTS	3 , 4 , 2	3 , 0 , 6	1 , 2 , 6	1 , 0 , 8	2 , 1 , 6	1 , 2 , 6
NSTransformer	0 , 1 , 8	0 , 0 , 9	0 , 0 , 9	1 , 0 , 8	0 , 0 , 9	0 , 0 , 9
Naive Seasonal	0 , 0 , 9	0 , 0 , 9	0 , 0 , 9	0 , 0 , 9	0 , 0 , 9	0 , 0 , 9

Sig-MMD 与 CSig-MMD 能在不假设独立性的情况下，评估跨预测区间的联合分布。
在尾部截尾的 CSig-MMD 仍然保持严格正确，确保对尾部事件的公平评估。
Sig-MMD 能检测 CRPS、ES、QL 未能捕捉的时-变量和变量间依赖。
CSig-MMD 专注于尾部表现，在合成与真实世界的实验中区分预测者的尾部预测能力。
在数据集（ETT、Weather、Exchange、Illness、EWELD、ERA5）以及基础模型上的实验显示，CSig-MMD 与 Sig-MMD 给出的排序与标准指标不同，强调尾部敏感的洞见。

Figure 2 : The censoring process redistributes the probability mass from the body of the distribution (grey) to the Signature of the zero-path, while preserving the probability mass inside the target region (blue) which represents the tails of the distribution.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。