[论文解读] Multi-resolution Time-Series Transformer for Long-term Forecasting
MTST 引入一个多分支、多分辨率的基于补丁的 transformer,采用相对位置编码,以建模长期的多变量时间序列的多样性模式,在基准数据集上实现了最先进的结果。
The performance of transformers for time-series forecasting has improved significantly. Recent architectures learn complex temporal patterns by segmenting a time-series into patches and using the patches as tokens. The patch size controls the ability of transformers to learn the temporal patterns at different frequencies: shorter patches are effective for learning localized, high-frequency patterns, whereas mining long-term seasonalities and trends requires longer patches. Inspired by this observation, we propose a novel framework, Multi-resolution Time-Series Transformer (MTST), which consists of a multi-branch architecture for simultaneous modeling of diverse temporal patterns at different resolutions. In contrast to many existing time-series transformers, we employ relative positional encoding, which is better suited for extracting periodic components at different scales. Extensive experiments on several real-world datasets demonstrate the effectiveness of MTST in comparison to state-of-the-art forecasting techniques.
研究动机与目标
- 在长期预测中证明需要建模多尺度时间模式。
- 提出一个使用不同补丁大小以捕捉多样频率的多分支 MTST。
- 采用相对位置编码,以更好地捕捉周期性分量。
- 在多个真实数据集上展示更优的预测性能,并提供消融研究以证明设计选择。
提出的方法
- 构建 MTST,含 N 层,每层包含 B_n 个分支,用不同的补丁大小对输入进行标记。
- 在每个分支中,使用带有相对位置编码的自注意力对补丁级标记进行处理。
- 在每个 MTST 层中融合分支表示,形成下一层的共享嵌入。
- 独立处理每个时间序列通道(通道独立性),并可扩展到跨通道相关性。
- 训练以最小化 MSE,使用 Adam;对输入应用实例归一化,输出进行去归一化。
实验结果
研究问题
- RQ1多分辨率、多分支的 transformer 是否相较单一分辨率的补丁模型在长期预测中具有提升?
- RQ2在 MTST 中使用相对位置编码与绝对位置编码相比,其影响是什么?
- RQ3对包含或排除高分辨率/低分辨率分支的消融对性能有何影响?
- RQ4与最先进的基线相比,MTST 在多样化真实数据集和预测时域上的表现如何?
主要发现
| Dataset | T | MTST_MSE | MTST_MAE | PatchTST_MSE | PatchTST_MAE | DLinear_MSE | DLinear_MAE | MICN_MSE | MICN_MAE | TimesNet_MSE | TimesNet_MAE | Fedformer_MSE | Fedformer_MAE | Autoformer_MSE | Autoformer_MAE | Pyraformer_MSE | Pyraformer_MAE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Traffic | 96 | 0.356 | 0.244 | 0.367 | 0.251 | 0.410 | 0.282 | 0.473 | 0.293 | 0.595 | 0.318 | 0.576 | 0.359 | 0.597 | 0.371 | 2.085 | 0.468 |
| Traffic | 192 | 0.375 | 0.251 | 0.385 | 0.259 | 0.423 | 0.287 | 0.483 | 0.298 | 0.615 | 0.326 | 0.610 | 0.380 | 0.607 | 0.382 | 0.867 | 0.467 |
| Traffic | 336 | 0.386 | 0.256 | 0.398 | 0.265 | 0.436 | 0.296 | 0.491 | 0.303 | 0.616 | 0.326 | 0.608 | 0.375 | 0.623 | 0.387 | 0.869 | 0.469 |
| Traffic | 720 | 0.425 | 0.279 | 0.434 | 0.287 | 0.466 | 0.315 | 0.559 | 0.327 | 0.655 | 0.353 | 0.621 | 0.375 | 0.639 | 0.395 | 0.881 | 0.473 |
| Electricity | 96 | 0.127 | 0.222 | 0.130 | 0.222 | 0.140 | 0.237 | 0.157 | 0.266 | 0.178 | 0.284 | 0.186 | 0.302 | 0.196 | 0.313 | 0.386 | 0.449 |
| Electricity | 192 | 0.144 | 0.238 | 0.148 | 0.240 | 0.153 | 0.249 | 0.175 | 0.287 | 0.187 | 0.289 | 0.197 | 0.311 | 0.211 | 0.324 | 0.386 | 0.443 |
| Electricity | 336 | 0.162 | 0.256 | 0.167 | 0.261 | 0.169 | 0.267 | 0.200 | 0.308 | 0.208 | 0.307 | 0.213 | 0.328 | 0.214 | 0.327 | 0.378 | 0.443 |
| Electricity | 720 | 0.199 | 0.289 | 0.202 | 0.291 | 0.203 | 0.301 | 0.228 | 0.338 | 0.245 | 0.321 | 0.233 | 0.344 | 0.236 | 0.342 | 0.376 | 0.445 |
| Weather | 96 | 0.150 | 0.199 | 0.152 | 0.199 | 0.176 | 0.237 | 0.178 | 0.249 | 0.163 | 0.219 | 0.238 | 0.314 | 0.249 | 0.329 | 0.896 | 0.556 |
| Weather | 192 | 0.194 | 0.240 | 0.197 | 0.243 | 0.211 | 0.269 | 0.243 | 0.269 | 0.211 | 0.259 | 0.275 | 0.329 | 0.325 | 0.370 | 0.622 | 0.624 |
| Weather | 336 | 0.246 | 0.281 | 0.249 | 0.283 | 0.265 | 0.319 | 0.278 | 0.338 | 0.286 | 0.311 | 0.339 | 0.377 | 0.351 | 0.391 | 0.739 | 0.753 |
| Weather | 720 | 0.319 | 0.333 | 0.320 | 0.335 | 0.323 | 0.362 | 0.320 | 0.360 | 0.359 | 0.363 | 0.389 | 0.409 | 0.415 | 0.426 | 1.004 | 0.934 |
| ETTh1 | 96 | 0.358 | 0.390 | 0.375 | 0.399 | 0.375 | 0.399 | 0.413 | 0.442 | 0.421 | 0.440 | 0.376 | 0.415 | 0.435 | 0.446 | 0.664 | 0.612 |
| ETTh1 | 192 | 0.396 | 0.414 | 0.414 | 0.421 | 0.405 | 0.416 | 0.451 | 0.462 | 0.511 | 0.498 | 0.423 | 0.446 | 0.456 | 0.457 | 0.790 | 0.681 |
| ETTh1 | 336 | 0.391 | 0.420 | 0.431 | 0.436 | 0.439 | 0.443 | 0.556 | 0.528 | 0.484 | 0.478 | 0.444 | 0.462 | 0.486 | 0.487 | 0.891 | 0.738 |
| ETTh1 | 720 | 0.430 | 0.457 | 0.449 | 0.466 | 0.472 | 0.490 | 0.658 | 0.607 | 0.554 | 0.527 | 0.469 | 0.492 | 0.515 | 0.517 | 0.963 | 0.782 |
| ETTh2 | 96 | 0.257 | 0.326 | 0.274 | 0.336 | 0.289 | 0.353 | 0.303 | 0.364 | 0.366 | 0.417 | 0.332 | 0.374 | 0.332 | 0.368 | 0.645 | 0.597 |
| ETTh2 | 192 | 0.309 | 0.361 | 0.339 | 0.379 | 0.383 | 0.418 | 0.403 | 0.446 | 0.426 | 0.447 | 0.407 | 0.446 | 0.426 | 0.434 | 0.788 | 0.683 |
| ETTh2 | 336 | 0.302 | 0.366 | 0.331 | 0.380 | 0.448 | 0.465 | 0.603 | 0.550 | 0.406 | 0.435 | 0.400 | 0.447 | 0.477 | 0.479 | 0.907 | 0.747 |
| ETTh2 | 720 | 0.372 | 0.416 | 0.379 | 0.422 | 0.605 | 0.551 | 1.106 | 0.852 | 0.427 | 0.457 | 0.412 | 0.469 | 0.453 | 0.490 | 0.963 | 0.783 |
| ETTm1 | 96 | 0.286 | 0.338 | 0.290 | 0.342 | 0.299 | 0.343 | 0.308 | 0.360 | 0.356 | 0.385 | 0.326 | 0.390 | 0.510 | 0.492 | 0.543 | 0.510 |
| ETTm1 | 192 | 0.327 | 0.366 | 0.332 | 0.369 | 0.335 | 0.365 | 0.343 | 0.384 | 0.452 | 0.428 | 0.365 | 0.415 | 0.514 | 0.495 | 0.557 | 0.537 |
| ETTm1 | 336 | 0.362 | 0.389 | 0.366 | 0.392 | 0.369 | 0.386 | 0.395 | 0.411 | 0.419 | 0.425 | 0.392 | 0.425 | 0.510 | 0.492 | 0.754 | 0.655 |
| ETTm1 | 720 | 0.414 | 0.421 | 0.420 | 0.424 | 0.425 | 0.421 | 0.427 | 0.434 | 0.452 | 0.451 | 0.446 | 0.458 | 0.527 | 0.493 | 0.908 | 0.724 |
| ETTm2 | 96 | 0.162 | 0.251 | 0.165 | 0.255 | 0.167 | 0.260 | 0.169 | 0.268 | 0.188 | 0.276 | 0.180 | 0.271 | 0.205 | 0.293 | 0.435 | 0.507 |
| ETTm2 | 192 | 0.220 | 0.291 | 0.220 | 0.292 | 0.224 | 0.303 | 0.247 | 0.333 | 0.242 | 0.310 | 0.252 | 0.318 | 0.278 | 0.336 | 0.730 | 0.673 |
| ETTm2 | 336 | 0.272 | 0.326 | 0.278 | 0.329 | 0.281 | 0.342 | 0.290 | 0.351 | 0.300 | 0.346 | 0.324 | 0.364 | 0.343 | 0.379 | 1.201 | 0.845 |
| ETTm2 | 720 | 0.358 | 0.379 | 0.367 | 0.385 | 0.397 | 0.421 | 0.417 | 0.434 | 0.391 | 0.403 | 0.410 | 0.420 | 0.414 | 0.419 | 3.625 | 1.451 |
- MTST 在 7 个数据集、4 个时域和 2 个指标上达到最优性能。
- MTST 在 28 次 MSE 比较中有 27 次优于 PatchTST,且具有统计显著性。
- 消融显示去除低分辨率或高分辨率分支都会降低性能,验证了多尺度建模的价值。
- 相对位置编码始终比绝对编码提升预测准确性。
- 回看窗口分析和定性可视化支持 MTST 在捕捉多尺度时间结构方面的优势。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。