QUICK REVIEW

[論文レビュー] Data augmentation using synthetic data for time series classification\n with deep residual networks

Hassan Ismail Fawaz, Germain Forestier|arXiv (Cornell University)|Aug 7, 2018

Time Series Analysis and Forecasting被引用数 80

ひとこと要約

本論文は、時系列分類のための深層 ResNet を改善するための DTW ベースの合成時系列データ拡張を調査し、いくつかの小規模データセットで大きな改善を、アンサンブル形態での利点を示しています。

ABSTRACT

Data augmentation in deep neural networks is the process of generating\nartificial data in order to reduce the variance of the classifier with the goal\nto reduce the number of errors. This idea has been shown to improve deep neural\nnetwork's generalization capabilities in many computer vision tasks such as\nimage recognition and object localization. Apart from these applications, deep\nConvolutional Neural Networks (CNNs) have also recently gained popularity in\nthe Time Series Classification (TSC) community. However, unlike in image\nrecognition problems, data augmentation techniques have not yet been\ninvestigated thoroughly for the TSC task. This is surprising as the accuracy of\ndeep learning models for TSC could potentially be improved, especially for\nsmall datasets that exhibit overfitting, when a data augmentation method is\nadopted. In this paper, we fill this gap by investigating the application of a\nrecently proposed data augmentation technique based on the Dynamic Time Warping\ndistance, for a deep learning model for TSC. To evaluate the potential of\naugmenting the training set, we performed extensive experiments using the UCR\nTSC benchmark. Our preliminary experiments reveal that data augmentation can\ndrastically increase deep CNN's accuracy on some datasets and significantly\nimprove the deep model's accuracy when the method is used in an ensemble\napproach.\n

研究の動機と目的

小規模データセットに対する深層時系列分類器の過学習を減らすためのデータ拡張を動機づける。
合成時系列を生成する DTW ベースの DBA 重み付きデータ拡張法を提案する。
UCR TSC ベンチマークで深層 ResNet を用いた拡張の評価。
データセット全体で利得を安定させるためのアンサンブル統合を検討する。

提案手法

単変量時系列分類のために3つの残差ブロックを備えた深層 Residual Network (ResNet) を使用する。
訓練データセットから合成時系列を生成する DTW ベースの重み付き DBA 法を適用する。
合成生成のために、最も表現されているクラスのサイズを倍増させるよう拡張を設定する。
初期化とハイパーパラメータを同一にして、拡張有り・無しで ResNet を訓練する。
UCR データセット全体での正確度への影響を評価し、非拡張ベースラインと比較する。
後続確率を平均化してアンサンブルにより2つの ResNet を結合し、堅牢性を向上させる。）

実験結果

リサーチクエスチョン

RQ1DTW ベースの合成データ拡張は時系列分類における ResNet の性能を向上させるか？
RQ2拡張は小規模な UCR データセットと大規模なデータセットで正確度にどう影響するか？
RQ3拡張ありモデルと拡張なしモデルのアンサンブルはより堅牢な改善をもたらすか？
RQ4DBA ベースの拡張分布は元のデータ分布に忠実か？
RQ5提案手法の TSC ベンチマークにおける実用的な利点と制限は何か？

主な発見

データ拡張は一部のデータセットで深層モデルの正確度を大幅に向上させる（例：DiatomSizeReduction を 30% から 96% へ）。
平均して拡張は正確度を大幅に低下させることはなく、いくつかのデータセットで顕著な改善をもたらす場合がある。
拡張有りと無しの ResNet のアンサンブルは負の影響を減らし、データセット全体で利得を維持する。
ウィルコクソン符号順位検定はアンサンブル手法の有意な改善を示す（p < 0.0005）。
DiatomSizeReduction データセットは訓練サイズが非常に小さく（16 インスタンス）、合成データから大きな恩恵を受ける。
Wine（57 訓練インスタンス）のような一部のデータセットでは拡張により重要な改善が得られるが、全体としては UCR 全体セットで明確な総合勝利とは言えない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。