[论文解读] Scaling transformer neural networks for skillful and reliable medium-range weather forecasting
一个简单的基于Transformer的模型,具有天气特定嵌入、随机化动力学预测和压力加权损失,在WeatherBench 2上实现有竞争力的短期预报和在7天以上的长期预报方面优越,数据和计算量大幅降低。
Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints are available at https://github.com/tung-nd/stormer.
研究动机与目标
- 激励一种更简单、可扩展的数据驱动中期天气预报方法。
- 识别驱动性能的关键架构与训练组件。
- 证明在恰当的训练方案下,标准Transformer可达到甚至超越复杂模型。
- 展示模型规模与数据带来的良好扩展性,并与最先进基线进行比较。
提出的方法
- 使用带有天气特定嵌入的标准Transformer骨架,将变量进行标记化并通过交叉注意力进行聚合。
- 在随机区间上使用随机化动力学预测目标进行训练,以预测天气动力学Δδt。
- 以大气压力对损失进行加权,以强调近地表变量。
- 采用多步微调策略以改善长时程预报。
- 通过结合多区间滚动推理(在n个中选最佳m或同质策略)进行推理。
- 在WeatherBench 2 ERA5数据上以1–14天前导进行评估,并与Pangu-Weather、GraphCast和气候学对比。
实验结果
研究问题
- RQ1一个具有专门嵌入和训练方案的简易Transformer能否在WeatherBench 2上实现具有竞争力的短期预报和更优的长期预报?
- RQ2随机区间预测和压力加权损失是否显著提升各前导期的预报准确性?
- RQ3模型规模、patch大小和训练token数量如何影响性能和可扩展性?
- RQ4多步微调对减小较长前导期的滚动误差是否必不可少?
- RQ5所提出的方法在数据和计算效率方面与最先进的深度学习基线相比如何?
主要发现
- 该模型在1–7天预报中具有竞争力的准确性,并在7天以上的预测中优于基线。
- 训练所需的数据和计算量比Pangu-Weather和GraphCast少量数量级。
- 随机化动力学预测通过在不增加额外计算量的情况下实现多区间滚动推理来提升准确性。
- 压力加权损失和动力学预测的模型优于不具备这些组件的模型。
- 随着更大模型和更多训练token,性能提升;而更小的patch尺寸也带来收益。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。