QUICK REVIEW

[論文レビュー] Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, Oriol Vinyals|arXiv (Cornell University)|Sep 10, 2014

Natural Language Processing Techniques参考文献 29被引用数 13,314

ひとこと要約

デュアル LSTM エンコーダ-デコーダを用いたニューラルなシーケンス対シーケンスモデルは、WMT’14の英仏翻訳で最先端のBLEUスコアを達成し、フレーズベース SMTのベースラインを直接翻訳で上回り、SMT出力と組み合わせたリスコアリングで改善を示す。

ABSTRACT

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

研究の動機と目的

入力系列を出力系列へ直接マッピングする、強い構造仮定を持たないエンドツーエンドのシーケンス対シーケンス学習アプローチを実証する。
深いLSTMエンコーダ-デコーダが直接テキストを翻訳でき、リスコアリングを通じてSMTの性能を改善できることを示す。
学習と翻訳品質を向上させる技術を調査する（ソース文の逆順化や多層アーキテクチャの利用を含む）。

提案手法

深いLSTMを用いて入力系列をエンコードし、固定次元のベクトル表現を得る。
エンコードされた表現に条件づけされた別の深いLSTMでターゲット系列をデコードする。
左から右へのビーム探索デコーダを使用して翻訳を生成し、翻訳のp(T|S)を計算する。
訓練データ全体を通じて正しい翻訳の対数確率を最大化してエンドツーエンドで訓練する。
メモリ遅延を減らし最適化を改善するためにソース文を逆順にする実験を行う。
WMT’14英仏でBLEUを用いて評価し、直接翻訳とSMT n-bestリストのリスコアリングを含む。

実験結果

リサーチクエスチョン

RQ1完全なニューラルのエンコーダ-デコーダ（LSTM）で大規模な直接シーケンス対シーケンス翻訳を実行できるか？
RQ2ソース入力を逆順にすることは seq2seq LSTMモデルの学習効率と翻訳品質を改善するか？
RQ3ニューラルseq2seq翻訳は大規模タスクにおいて従来のSMTベースラインとどう比較・補完されるか？

主な発見

手法	test BLEU score (ntst14)
Baseline System [29]	33.30
Cho et al. [5]	34.54
Single forward LSTM, beam size 12	26.17
Single reversed LSTM, beam size 12	30.59
Ensemble of 5 reversed LSTMs, beam size 1	33.00
Ensemble of 2 reversed LSTMs, beam size 12	33.27
Ensemble of 5 reversed LSTMs, beam size 2	34.50
Ensemble of 5 reversed LSTMs, beam size 12	34.81

深層LSTMのアンサンブルは ntst14 の直接翻訳で 34.81 BLEU を達成し、SMTベースラインの 33.30BLEU を上回った。
逆順LSTMのアンサンブルで SMTベースラインの1000-bestリストをリスコアリングし 36.5 BLEU に到達、最良公表SMT結果に近い。
単一LSTMと様々なビーム設定は、特に逆順とアンサンブルを使用した場合、ニューラルモデルがSMTの性能に近づくまたは超えることを示す。
ソース文を逆順にするとBLEUが劇的に改善する（ある設定では25.9から30.6へ）し、困難度も改善（ perplexityの5.8から4.7へ）。
完全なモデルは384Mパラメータ、ソース語彙160k、ターゲット語彙80k、7.5エポックでSGDと勾配クリッピングを用いて訓練。
長文は性能を劣化させず; 定性的分析は学習された表現が語順を尊重し意味を捉えていることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。