QUICK REVIEW

[論文レビュー] Temporal Convolutional Attention-based Network For Sequence Modeling

Hao, Hongyan, Yan Wang|arXiv (Cornell University)|Feb 28, 2020

Topic Modeling参考文献 16被引用数 39

ひとこと要約

TCAN は temporal convolution と attention と enhanced residuals を組み合わせ、連続データをモデル化する。再帰的ネットワークを用いず、コンパクトなアーキテクチャで PTB および WikiText-2 における最先端の perplexity/bpc を達成する。

ABSTRACT

With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 30.28 on word-level PTB, 1.092 on character-level PTB, and 9.20 on WikiText-2.

研究の動機と目的

因果性と並列性を保ったまま、系列モデリングのために再帰的ネットワークを近似できるフィードフォワード型アーキテクチャの探索を動機づける。
Temporal Convolutional Networks と注意機構のハイブリッドである TCAN を導入し、内部の系列相関を捉える。
パラメータを追加せずに、層間で重要な情報を伝播させる Enhanced Residuals を提案する。
PTB 単語レベル、PTB 文字レベル、および WikiText-2 データセットで最先端の性能を示す。

提案手法

Temporal Attention (TA) と Enhanced Residual (ER) の二つのモジュールを持つ Temporal Convolutional Attention-based Network (TCAN) を提案する。
因果ダイレーテッド畳み込みのバックボーンを用いて、受容野を増大させつつ系列依存性をモデリングする（拡張率は d=2^l）。
TA では、レイヤ入力からキー、クエリ、バリューを計算し、因果性を保つために下三角行列マスク付きアテンションを適用する。
ER では、TA から情報を重み付けして集約し、標準の残差経路と組み合わせる拡張 residual を形成する。
Adam オプティマイザで訓練し、PTB および WT2 で RNN-, CNN-, Transformer ベースのベースラインと TCAN を比較する。

実験結果

リサーチクエスチョン

RQ1フィードフォワード型の非再帰アーキテクチャは、標準的な言語モデル評価において再帰型モデルと同等、またはそれを上回ることができるか？
RQ2因果ダイラテッド畳み込みと temporal attention の統合は、因果性を保ちながら長距離依存を捉えることができるか？
RQ3Enhanced Residuals メカニズムは、モデルパラメータを増やすことなく情報伝播を改善するか？
RQ4TCAN は、PTB 単語レベル、PTB 文字レベル、および WikiText-2 で、最先端モデルと比較してどうであるか？

主な発見

データセット	モデル	サイズ（M）	指標	値
PTB Word-level	Generic TCN	13	ppl	88.68
PTB Word-level	NAS Cell	54	ppl	62.4
PTB Word-level	AWD-LSTM	24	ppl	58.8
PTB Word-level	TrellisNet	33	ppl	56.80
PTB Word-level	TrellisNet-MoS	34	ppl	54.19
PTB Word-level	GPT-2	1542	ppl	35.76
PTB Word-level	TCAN-no-res	13	ppl	32.19
PTB Word-level	TCAN	13	ppl	30.28
WT2 Word-level	Generic TCN	28.6	ppl	138.5
WT2 Word-level	AWD-LSTM	33	ppl	44.3
WT2 Word-level	AWD-LSTM-MoS	35	ppl	40.68
WT2 Word-level	GPT-2	1542	ppl	18.34
WT2 Word-level	TCAN-no-res	33	ppl	10.92
WT2 Word-level	TCAN	33	ppl	9.20
PTB Char-level	Generic TCN	3.0	bpc	1.31
PTB Char-level	IND-RNN	12.0	bpc	1.23
PTB Char-level	NAS Cell	16.3	bpc	1.214
PTB Char-level	AWD-LSTM	13.8	bpc	1.175
PTB Char-level	TrellisNet-MoS	13.4	bpc	1.158
PTB Char-level	TCAN-no-res	4.3	bpc	1.104
PTB Char-level	TCAN	4.3	bpc	1.092

TCAN は PTB 単語レベルで 30.28 perplexity, PTB 文字レベルで 1.092 bits-per-character, WikiText-2 で 9.20 perplexity を達成する（未来情報リークなし）。
TCAN は評価データセット全体で AWD-LSTM、TrellisNet、汎用 TCN などのベースラインを上回る。
アブレーションにより、Temporal Attention がこのタスクにおいて同等の畳み込み層より効果的であることを示す。
Enhanced Residuals は追加のパラメータを増やさずに性能向上を提供する。
TCAN は Transformer および RNN ベースのモデルより小型でありながら高い性能を発揮する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。