QUICK REVIEW

[论文解读] Data augmentation using synthetic data for time series classification with deep residual networks

Hassan Ismail Fawaz, Germain Forestier|arXiv (Cornell University)|Aug 7, 2018

Time Series Analysis and Forecasting参考文献 15被引用 73

一句话总结

该论文应用基于 DTW 的 DBA 加权数据增强来扩增用于时间序列分类的 ResNet，在一些小数据集上显示出巨大增益，并在组合成集成时表现出更强的鲁棒性。

ABSTRACT

Data augmentation in deep neural networks is the process of generating artificial data in order to reduce the variance of the classifier with the goal to reduce the number of errors. This idea has been shown to improve deep neural network's generalization capabilities in many computer vision tasks such as image recognition and object localization. Apart from these applications, deep Convolutional Neural Networks (CNNs) have also recently gained popularity in the Time Series Classification (TSC) community. However, unlike in image recognition problems, data augmentation techniques have not yet been investigated thoroughly for the TSC task. This is surprising as the accuracy of deep learning models for TSC could potentially be improved, especially for small datasets that exhibit overfitting, when a data augmentation method is adopted. In this paper, we fill this gap by investigating the application of a recently proposed data augmentation technique based on the Dynamic Time Warping distance, for a deep learning model for TSC. To evaluate the potential of augmenting the training set, we performed extensive experiments using the UCR TSC benchmark. Our preliminary experiments reveal that data augmentation can drastically increase deep CNN's accuracy on some datasets and significantly improve the deep model's accuracy when the method is used in an ensemble approach.

研究动机与目标

用深度网络解决时间序列分类（TSC）中的过拟合和训练数据有限问题。
评估基于动态时间规整（DTW）的合成数据增强方法用于 TSC。
评估在使用 UCR TSC 基准的数据集上，对深度 ResNet 架构进行增强的影响。
探索集成策略以减轻数据增强的潜在负面影响。

提出的方法

使用适用于单变量时间序列且包含三个残差块的深度 ResNet 架构。
应用基于 DTW 的加权平均（DBA）数据增强，从训练集中生成合成时间序列，通过 Average Selected 方案选择样本。
生成的合成数据量等于样本最多的类别的两倍。
在一致的初始化和优化设置下，训练带有和不带有数据增强的模型。
通过对后验概率进行集合平均，将带增强和不带增强的 ResNet 预测结合起来。

实验结果

研究问题

RQ1基于 DTW 的合成数据增强是否提升了 ResNet 在 UCR 档案的时间序列分类任务上的性能？
RQ2在哪些数据集上增强有帮助或有负效应，这些效应有多显著？
RQ3带增强与不带增强模型的集合是否能在各数据集上提供更稳健的改进？
RQ4对像 DiatomSizeReduction 和 Wine 这样的较小且困难的数据集，数据增强有何影响？

主要发现

数据增强可在某些数据集上显著提升深度模型的准确率（例如 DiatomSizeReduction：30% 到 96%）。
数据增强在某些数据集上可能带来小幅负面影响，但总体上并未显著降低准确率。
带增强与不带增强的 ResNet 集成减少了性能下降的数据集数量，并在其他数据集中保留提升。
在 DiatomSizeReduction 数据集上，训练样本只有 16 个，数据增强带来较大提升，而 1-NN with DTW 达到 97% 的准确率，表明该数据集对更简单方法较易解决。
Wilcoxon 符号秩检验显示集成相较于单一模型的改进具有显著差异（p 值 < 0.0005）。
Wine 数据集也显示出数据增强的重要改进，表明收益具有数据集依赖性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。