QUICK REVIEW

[论文解读] The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau|arXiv (Cornell University)|Oct 31, 2018

Time Series Analysis and Forecasting参考文献 11被引用 96

一句话总结

这篇论文介绍了首个 UEA 多元时间序列分类归档（2018），包含 30 个数据集，标准化的等长格式，以及用于严格 MTSC 评估的训练/测试分割。

ABSTRACT

In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.

研究动机与目标

提供一个公开、标准化的基准用于 multivariate time series classification (MTSC)。
扩展 MTSC 评价，超越小型、领域特定的数据集以促成更严格的比较。
将数据格式化为等长、无缺失值，并为所有问题提供训练/测试分割。
将归档及其随附工具托管在 timeseriesclassification.com，以便研究人员重复使用。
将数据集分类到域（HAR、Motion、ECG、EEG/MEG、Audio 等）并记录数据来源。

提出的方法

组装 MTSC 归档的第一版，涵盖 30 个跨越多样域的数据集。
将所有数据标准化为等长、去除缺失数据，并提供明确的训练/测试分割。
以 Weka multi-instance 格式提供数据，具备逐维表示和关系属性。
提供可下载的代码，以在各实验间灵活拆分 multivariate ARFF 文件。
打包整个归档（zip ~2GB）并托管在 timeseriesclassification.com，方便访问。

实验结果

研究问题

RQ12018 年 UEA 归档包含多少个 MTSC 数据集，它们覆盖哪些域？
RQ2为公正比较，用于标准化 MTSC 问题的数据格式化和预处理步骤是什么？
RQ3每个数据集的训练/测试分割如何定义并提供？
RQ4提供哪些工具来操作和重用 MTSC 数据集（例如拆分 ARFF 文件）？

主要发现

2018 年版本包含 30 个多元时间序列分类数据集。
所有问题均重新格式化为等长、无缺失数据，并包含训练/测试分割。
该归档可作为单个 ~2GB 的 zip 文件获得，带有每个问题的目录和 Weka multi-instance 格式。
数据组织成如 Human Activity Recognition、Motion、ECG、EEG/MEG、Audio Spectra 以及 Others 等域。
提供拆分 multivariate ARFF 文件的代码，以便在研究间重复使用。
归档托管在 www.timeseriesclassification.com，供公众访问。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。