QUICK REVIEW

[论文解读] TCBench: A Benchmark for Tropical Cyclone Track and Intensity Forecasting at the Global Scale

Milton S. Gomez, Marie C. McGraw|arXiv (Cornell University)|Jan 30, 2026

Tropical and Extratropical Cyclones Research被引用 0

一句话总结

TCBench 提供全球范围内 1–5 天热带气旋轨迹与强度预测的基准，与 IBTrACS 真实数据和标准化评估指标比较神经气象模型与基于物理的集合模型。

ABSTRACT

TCBench is a benchmark for evaluating global, short to medium-range (1-5 days) forecasts of tropical cyclone (TC) track and intensity. To allow a fair and model-agnostic comparison, TCBench builds on the IBTrACS observational dataset and formulates TC forecasting as predicting the time evolution of an existing tropical system conditioned on its initial position and intensity. TCBench includes state-of-the-art dynamical (TIGGE) and neural weather models (AIFS, Pangu-Weather, FourCastNet v2, GenCast). If not readily available, baseline tracks are consistently derived from model outputs using the TempestExtremes library. For evaluation, TCBench provides deterministic and probabilistic storm-following metrics. On 2023 test cases, neural weather models skillfully forecast TC tracks, while skillful intensity forecasts require additional steps such as post-processing. Designed for accessibility, TCBench helps AI practitioners tackle domain-relevant TC challenges and equips tropical meteorologists with data-driven tools and workflows to improve prediction and TC process understanding. By lowering barriers to reproducible, process-aware evaluation of extreme events, TCBench aims to democratize data-driven TC forecasting.

研究动机与目标

为风险缓解与韧性提升动员对全球热带气旋准确预测的需求。
定义一个公平、模型无关的轨迹与强度预测评估框架。
提供一个开放、可扩展的数据集与工具箱，便于可重复评估神经与基于物理的模型。
提供后处理基线以提升强度预测与快速增强（RI）预测。
通过让基准测试和工作流民主化，降低数据驱动的 TC 预测门槛。

提出的方法

将TC预测表述为在给定初始状态的前提下预测现有热带系统的时间演变。
将异构数据源（观测、再分析、物理/数据驱动模型）整合到统一的评估框架中。
为确定性与概率性指标提供数据预处理、模型基线（物理基与神经）及评估协议。
使用 TempestExtremes 与 HuracanPy 将神经模型输出与 IBTrACS 轨迹对齐，以实现公平比较。
对 AI 预测进行后处理以生成强度预测和 RI 标志，从而实现针对 RI 的评估。

Figure 1: TCBench defines TC forecasting as predicting time-series of track and intensity knowing the system’s initial state. It integrates heterogeneous data sources (observations, reanalysis, physics/data-driven models) into a unified evaluation framework to standardize model assessment.

实验结果

研究问题

RQ1神经气象模型在全球尺度的 1–5 天预测中，轨迹预测能力是否可与物理基集合并相媲美？
RQ2相对于基线，后处理的 AI 预测在多大程度上能提升强度预测和快速强化检测？
RQ3在 lead time 最多到 5 天的情况下，确定性与概率性轨迹和强度指标的相对表现如何？
RQ4将观测数据与后处理整合对热带气旋强度预测的可靠性有何影响？

主要发现

神经气象模型能够在 5 天内对 TC 轨迹进行较好预测，轨迹在确定性指标上接近甚至可与部分基于物理的集合模型相比拟。
基于物理的集合（GEFS）通常在概率轨迹预测方面通过 CRPS 提供更强的性能，显示与神经模型的互补优势。
对 AI 预测进行后处理可显著改善强度预测（最大风速 Vmax 与最低气压 pmin），在某些前置时长下的性能可以与 GEFS 相当。
后处理的 AI 模型在一定程度上能够捕捉快速增强事件，但在不同模型和时长上成功有限，表明 RI 是一个具有挑战性的目标。
确定性轨迹技能因模型与前置时长而异，概率轨迹则从集合方法中受益；当将观测数据与后处理工具结合时，强度预测得到改进。

Figure 2: (a) 2023 tropical cyclones in the TCBench test year, from IBTrACS. The lines represent the position of each tropical cyclone over time, with the line color representing the storm’s intensity at that position. (b) IBTrACS estimate of tropical cyclone numbers from 2017-2022 (corresponding to

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。