[Paper Review] Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting
Scaleformer introduces an iterative multi-scale refinement framework with cross-scale normalization that can be plugged into existing transformer-based time series models, yielding substantial MSE/MAE improvements with minimal overhead.
The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing architecture adaptations, and a specially-designed normalization scheme, we are able to achieve significant performance improvements, from 5.5% to 38.5% across datasets and transformer architectures, with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of each of our contributions across the architecture and methodology. Furthermore, our experiments on various public datasets demonstrate that the proposed improvements outperform their corresponding baseline counterparts. Our code is publicly available in https://github.com/BorealisAI/scaleformer.
Motivation & Objective
- Motivate scale-aware processing for time series forecasting to capture inter-scale dependencies.
- Propose a general, architecture-agnostic multi-scale refinement framework that can be applied to transformer backbones (e.g., FEDformer, Autoformer).
- Introduce cross-scale normalization to mitigate distribution shifts across scales and windows during iterative refinement.
- Demonstrate empirical gains across multiple datasets and backbones via ablations and comparisons.
Proposed method
- Define a set of temporal scales with downsampling (s and powers of s) and iterative refinement from smallest to original scale.
- Apply the same Transformer module at each scale, with encoder input from downsampled look-back and decoder input via upsampled prior outputs.
- Introduce cross-scale normalization that centers encoder/decoder inputs using a moving-average statistic to reduce distribution shifts across scales.
- Embed inputs with value, temporal, and scale-aware fixed-position embeddings.
- Train with an adaptive loss (Barron, 2019) f(x, alpha, c) that is learned end-to-end, replacing standard MSE when outliers are present.
Experimental results
Research questions
- RQ1Can iterative multi-scale refinement with shared weights improve forecast accuracy across different transformer backbones?
- RQ2Does cross-scale normalization effectively stabilize training and prevent error propagation across scales?
- RQ3How much performance gain do multi-scale refinements confer on baselines like FEDformer, Autoformer, Informer, Reformer, and Performer across diverse datasets?
Key findings
- Mean-squared-error reductions range from 5.5% to 38.5% when applying Scaleformer across backbone models.
- Average improvements over baselines are 5.6% (FEDFormer), 13.5% (Autoformer), and 38.5% (Informer) in MSE, with corresponding MAE gains.
- Cross-scale normalization is essential; without it, multi-scale variants underperform in many cases, while normalization with a single scale also helps some models.
- Ablation studies show that combining multi-scale refinement with the adaptive loss yields the best performance across datasets.
- The framework maintains similar parameter counts to baselines, with modest computational overhead, and scales across multiple datasets (Electricity, Weather, Exchange-rate, Traffic, ILI).
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.