[论文解读] SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters
SparseTSF 是一个超轻量级的长期时间序列预测模型(<1k 参数),通过 Cross-Period Sparse Forecasting 将周期性与趋势解耦,以实现具有竞争力的准确性。
This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than *1k* parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is publicly available at this repository: https://github.com/lss-1138/SparseTSF.
研究动机与目标
- 在实现极少计算资源的前提下解决高精度长期预测的挑战。
- 利用数据固有的周期性,将周期性与趋势解耦。
- 开发一个轻量级模型,在极少参数的情况下保持具有竞争力甚至更优的性能。
- 在低资源场景中展示泛化能力和效率优势。
提出的方法
- 通过将时间序列下采样为 w 个子序列并在每个子序列上应用一个共享参数的线性预测器,介绍 Cross-Period Sparse Forecasting。
- 在稀疏预测之前使用滑动聚合(1D 卷积)来缓解信息丢失和离群点敏感性。
- 通过均值减法对输入进行归一化,并在输出中重新加回均值,以缓解分布漂移。
- 使用简单的均方误差损失进行训练。
- 提供理论分析,显示 Sparse 技术的参数效率和有效性。
- 在标准数据集上使用 CI(Channel Independent,信道独立)策略对比最先进的 LTSF 模型进行评估。
实验结果
研究问题
- RQ1Can Cross-Period Sparse Forecasting decouple periodicity from trend to enable accurate long-horizon forecasts with extremely few parameters?
- RQ2How does SparseTSF perform relative to state-of-the-art LTSF models on mainstream benchmarks while using sub-1k parameters?
- RQ3What are the efficiency gains (parameters, MACs, memory, runtime) and generalization capabilities of SparseTSF?
- RQ4How sensitive is performance to the chosen period w and how well does SparseTSF generalize across domains with the same periodicity?
主要发现
| 模型 | 参数 | MACs | 最大内存(MB) | 训练时间(s) |
|---|---|---|---|---|
| Informer (2021) | 12.53 M | 3.97 G | 969.7 | 70.1 |
| Autoformer (2021) | 12.22 M | 4.41 G | 2631.2 | 107.7 |
| FEDformer (2022b) | 17.98 M | 4.41 G | 1102.5 | 238.7 |
| FiLM (2022a) | 12.22 M | 4.41 G | 1773.9 | 78.3 |
| PatchTST (2023) | 6.31 M | 11.21 G | 10882.3 | 290.3 |
| DLinear (2023) | 485.3 K | 156.0 M | 123.8 | 25.4 |
| FITS (2024) | 10.5 K | 79.9 M | 496.7 | 35.0 |
| SparseTSF (Ours) | 0.92 K | 12.71 M | 125.2 | 31.3 |
- SparseTSF achieves competitive or superior MSE performance compared to strong baselines on multiple LTSF datasets with under 1k parameters.
- The Sparse technique enables order-of-magnitude parameter reductions (vs. mainstream models) while maintaining robustness (low standard deviation across runs).
- Efficiency metrics show SparseTSF uses ~0.92k parameters and ~12.7M MACs, with markedly lower memory and training time than many baselines.
- Ablation studies confirm the Sparse technique substantially improves Linear, Transformer, and GRU baselines, indicating broad applicability of the approach.
- Cross-domain generalization experiments show SparseTSF outperforms several baselines when transferring between datasets with the same daily periodicity.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。