QUICK REVIEW

[论文解读] Monash University, UEA, UCR Time Series Regression Archive

Chang Wei Tan, Christoph Bergmeir|arXiv (Cornell University)|Jun 19, 2020

Time Series Analysis and Forecasting参考文献 17被引用 11

一句话总结

本文提出了首个时间序列回归（TSR）基准数据集档案，包含19个来自不同领域的多样化数据集，涵盖多种维度、不等长序列及缺失值。该工作通过提供标准化数据和初始模型基准，为通用时间序列回归研究奠定了基础，填补了时间序列研究中除分类与预测之外的关键空白。

ABSTRACT

Time series research has gathered lots of interests in the last decade, especially for Time Series Classification (TSC) and Time Series Forecasting (TSF). Research in TSC has greatly benefited from the University of California Riverside and University of East Anglia (UCR/UEA) Time Series Archives. On the other hand, the advancement in Time Series Forecasting relies on time series forecasting competitions such as the Makridakis competitions, NN3 and NN5 Neural Network competitions, and a few Kaggle competitions. Each year, thousands of papers proposing new algorithms for TSC and TSF have utilized these benchmarking archives. These algorithms are designed for these specific problems, but may not be useful for tasks such as predicting the heart rate of a person using photoplethysmogram (PPG) and accelerometer data. We refer to this problem as Time Series Regression (TSR), where we are interested in a more general methodology of predicting a single continuous value, from univariate or multivariate time series. This prediction can be from the same time series or not directly related to the predictor time series and does not necessarily need to be a future value or depend heavily on recent values. To the best of our knowledge, research into TSR has received much less attention in the time series research community and there are no models developed for general time series regression problems. Most models are developed for a specific problem. Therefore, we aim to motivate and support the research into TSR by introducing the first TSR benchmarking archive. This archive contains 19 datasets from different domains, with varying number of dimensions, unequal length dimensions, and missing values. In this paper, we introduce the datasets in this archive and did an initial benchmark on existing models.

研究动机与目标

为解决时间序列回归（TSR）缺乏标准化基准资源的问题，该任务与时间序列分类（TSC）和时间序列预测（TSF）不同。
通过整理具有不同特征（如不等长序列和缺失值）的真实世界数据集，支持TSR中通用方法论的开发。
通过展示专用基准基础设施的可行性和必要性，激发更广泛的研究兴趣于TSR。
在新TSR档案上提供现有模型的初步基准，以确立基线性能。

提出的方法

作者从医疗健康、传感器数据和环境监测等多个领域收集了19个时间序列数据集，以确保广泛适用性。
数据集涵盖单变量和多变量时间序列，具有不同的长度、缺失值和非均匀采样特征。
对每个数据集进行了预处理，以确保与标准机器学习流程兼容，同时保留原始数据特征。
作者在所有数据集上评估了多种现有模型（如前馈网络、卷积网络和循环网络），以建立基线性能。
评估采用标准回归指标，包括平均绝对误差（MAE）和均方误差（MSE），并在标准训练-测试划分上进行。
基准测试过程包括跨数据集分析，以识别模型性能趋势和特定数据集的挑战。

实验结果

研究问题

RQ1真实世界时间序列数据集的关键特征是什么，使其适合用于通用时间序列回归（TSR）基准测试？
RQ2现有机器学习模型在具有不同数据质量和结构的多样化TSR数据集上的表现如何？
RQ3在TSC和TSF中观察到的模型性能趋势在TSR设置下在多大程度上具有可推广性？
RQ4哪些常见数据挑战（如缺失值和不等长序列）最严重影响TSR模型性能？
RQ5标准化基准档案在提升TSR研究可复现性和推动研究进展方面有何作用？

主要发现

所提出的TSR基准档案包含来自多样化领域的19个数据集，包括用于心率预测的PPG和加速度计数据，展现出广泛的现实相关性。
这些数据集在长度、维度和数据质量方面表现出显著异质性，包括缺失值和非均匀采样。
基线模型（包括前馈网络和循环神经网络）在不同数据集上表现不一，无单一模型始终优于其他模型。
当处理缺失值比例较高或序列长度高度可变的数据集时，模型性能出现下降。
初步基准测试表明，为TSC或TSF设计的现有模型在未经过调整的情况下无法直接应用于TSR，凸显了开发专用TSR方法的必要性。
该档案支持可复现的评估，并为未来通用TSR算法的开发奠定了基础。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。