QUICK REVIEW

[论文解读] Data augmentation using synthetic data for time series classification with deep residual networks

Hassan Ismail Fawaz, Germain Forestier|arXiv (Cornell University)|Aug 7, 2018

Time Series Analysis and Forecasting被引用 80

一句话总结

本文研究基于 DTW 的合成时间序列数据增强，以提升用于时间序列分类的深度残差网络（ResNet），在某些小数据集上取得了较大提升，并在集成形式中表现出收益。

ABSTRACT

Data augmentation in deep neural networks is the process of generating artificial data in order to reduce the variance of the classifier with the goal to reduce the number of errors. This idea has been shown to improve deep neural network's generalization capabilities in many computer vision tasks such as image recognition and object localization. Apart from these applications, deep Convolutional Neural Networks (CNNs) have also recently gained popularity in the Time Series Classification (TSC) community. However, unlike in image recognition problems, data augmentation techniques have not yet been investigated thoroughly for the TSC task. This is surprising as the accuracy of deep learning models for TSC could potentially be improved, especially for small datasets that exhibit overfitting, when a data augmentation method is adopted. In this paper, we fill this gap by investigating the application of a recently proposed data augmentation technique based on the Dynamic Time Warping distance, for a deep learning model for TSC. To evaluate the potential of augmenting the training set, we performed extensive experiments using the UCR TSC benchmark. Our preliminary experiments reveal that data augmentation can drastically increase deep CNN's accuracy on some datasets and significantly improve the deep model's accuracy when the method is used in an ensemble approach.

研究动机与目标

推动数据增强以在小数据集上减少深度时间序列分类器的过拟合。
提出一种基于 DTW 的 DBA 加权数据增强方法，用于生成合成时间序列。
在 UCR TSC 基准上用深度 ResNet 评估数据增强。
探索通过集成来在各数据集之间稳定提升。

提出的方法

使用具有三个残差块的深度残差网络（ResNet）进行单变量时间序列分类。
应用基于 DTW 的加权 DBA 方法从训练集生成合成时间序列。
将增强设置为使最具代表性的类别的样本量翻倍以进行合成。
在相同初始化和超参数下，训练带有增强与不带增强的 ResNet。
评估在 UCR 数据集上的准确度影响，并与非增强基线进行比较。
通过对后验概率取平均将两个 ResNet 组合成一个集成模型，以提高鲁棒性。

实验结果

研究问题

RQ1基于 DTW 的合成数据增强是否能提升 ResNet 在时间序列分类上的表现？
RQ2增强在小型 UCR 数据集与较大数据集上的准确度有何影响？
RQ3带增强与不带增强模型的集成是否能带来更稳健的提升？
RQ4基于 DBA 的增强分布是否忠实于原始数据分布？
RQ5所提出方法在 TSC 基准中的实际收益与局限性有哪些？

主要发现

数据增强在某些数据集上可以显著提升深度模型的准确性（例如 DiatomSizeReduction 从 30% 提升到 96%）。
总体而言，增强并不显著降低准确度，并且可以显著提升部分数据集。
带增强与不带增强的 ResNet 集成降低负面影响并在各数据集上维持提升。
Wilcoxon 符号秩检验显示该集成方法的显著改进（p < 0.0005）。
DiatomSizeReduction 数据集训练样本非常少（16 个实例），从合成数据中获得显著受益。
在某些数据集如 Wine（57 个训练样本）上，增强带来显著改进，但总体而言，在整个 UCR 集上并没有明显的全面胜利。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。