QUICK REVIEW

[논문 리뷰] The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau|arXiv (Cornell University)|2018. 10. 31.

Time Series Analysis and Forecasting참고 문헌 11인용 수 96

한 줄 요약

이 논문은 30개의 데이터셋으로 구성된 첫 UEA 다변량 시계열 분류 아카이브(2018)를 도입하며, 엄격한 MTSC 평가를 가능하게 하는 균일 길이 형식 및 학습/테스트 분할을 표준화합니다.

ABSTRACT

In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.

연구 동기 및 목표

다변량 시계열 분류(MTSC)를 위한 공개적이고 표준화된 벤치마크를 제공한다.
작고 도메인 특화된 세트를 넘어 MTSC 평가를 확장하여 엄격한 비교를 촉진한다.
데이터를 길이가 같고 결측값이 없도록 포맷하고 모든 문제에 대해 학습/테스트 분할을 제공한다.
연구자들이 재사용할 수 있도록 timeseriesclassification.com에 아카이브와 도구를 호스팅한다.
데이터를 HAR, Motion, ECG, EEG/MEG, Audio 등의 도메인으로 분류하고 데이터 원천을 문서화한다.

제안 방법

다양한 도메인을 포괄하는 30개 데이터셋으로 MTSC 아카이브의 첫 번째 버전을 구성한다.
모든 데이터를 길이 같게 표준화하고 결측 데이터를 제거하며 명시적인 학습/테스트 분할을 제공한다.
다차원 표현과 리레이셔널 속성이 있는 per-dimension 표현을 갖춘 Weka 다중 인스턴스 포맷으로 데이터를 제공한다.
실험 간 유연성을 위해 다변량 ARFF 파일을 분할하는 다운로드 가능한 코드를 제공한다.
아카이브 전체(zip ~2GB)를 번들로 묶어 timeseriesclassification.com에서 쉽게 접근 가능하도록 호스팅한다.

실험 결과

연구 질문

RQ12018 UEA 아카이브에 포함된 MTSC 데이터셋은 몇 개이며 어떤 도메인을 다루는가?
RQ2공정한 비교를 위해 MTSC 문제를 표준화하기 위해 어떤 데이터 형식화 및 전처리 단계가 사용되는가?
RQ3데이터셋마다 학습/테스트 분할은 어떻게 정의되고 제공되는가?
RQ4MTSC 데이터세트를 조작하고 재사용하기 위해 어떤 도구가 제공되는가(예: ARFF 파일 분할)?

주요 결과

2018년 버전에는 30개의 다변량 시계열 분류 데이터셋이 포함되어 있다.
모든 문제는 길이가 같도록 재포맷되었고 결측 데이터가 없으며 학습/테스트 분할을 포함한다.
아카이브는 문제별 디렉토리와 Weka 다중 인스턴스 포맷으로 ~2GB의 단일 ZIP 파일로 제공된다.
데이터는 Human Activity Recognition, Motion, ECG, EEG/MEG, Audio Spectra, 기타 등으로 도메인별로 구성되어 있다.
다변량 ARFF 파일 분할 코드를 제공하여 연구 간 재사용을 촉진한다.
아카이브는 public access를 위해 www.timeseriesclassification.com에서 호스팅된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.