QUICK REVIEW

[論文レビュー] LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Xiang Li, Tao Qin|arXiv (Cornell University)|Oct 31, 2016

Topic Modeling参考文献 28被引用数 42

ひとこと要約

LightRNNは、語彙の各単語を共有テーブル内の行ベクトルと列ベクトルで表現することで、$|V|$ 個のベクトルではなく $2\sqrt{|V|}$ 個のベクトルで済ませる2-Component共有埋め込み機構を提案する。これにより、RNNモデルのサイズを削減し、学習を高速化する。One-Billion-Wordベンチマークにおいて、40-100倍の小型化と2倍の高速化を達成しながら、最適な単語割り当てを実現するためのブートストラップ精錬フレームワークを用いて精度を維持し、最先端の perplexity を達成した。

ABSTRACT

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.

研究の動機と目的

語彙数が1000万語を超えるような大規模語彙NLPタスクにおけるRNNの学習にかかるメモリと計算コストを低減すること。
GPUおよびモバイルデバイス向けの展開を想定した、予測精度を損なわずモデルサイズと学習時間を短縮すること。
大規模言語モデルにおけるスケーラブルな学習と推論を可能にする効率的な単語表現手法の設計。
単語の割り当てと埋め込みベクトルを同時に最適化するブートストラップフレームワークの開発により、性能の向上を図ること。

提案手法

語彙内の各単語を2次元テーブルからの行ベクトルと列ベクトルの組み合わせとして表現することで、固有のベクトル数を $|V|$ 個から $2\sqrt{|V|}$ 個に削減する。
ブートストラップ学習ループを採用：初期段階で単語の割り当てをランダムに初期化し、埋め込みを固定した後、学習損失を最小化するように最小重み完全マッチングにより割り当てを精錬する。
単語の割り当て精錬をグラフ理論における最小重み完全マッチング問題として定式化し、効率的な最適化を可能にする。
2C共有埋め込みを用いてRNNを学習し、埋め込みの更新と割り当ての精錬を交互に繰り返し、収束するまで続ける。
言語モデルタスクに本手法を適用し、One-Billion-Word や ACLW といったベンチマークデータセットで評価する。
n-gramモデルとのアンサンブル技術を適用し、perplexityスコアをさらに向上させる。

実験結果

リサーチクエスチョン

RQ1共有埋め込み機構は、大規模語彙言語モデルタスクにおいて、性能を劣化させることなくRNNモデルサイズと学習時間を削減できるか？
RQ22-Component共有埋め込みは、単語間の意味的・文法的関係をどれほど的確に捉えられるか？
RQ3単語の割り当てと埋め込みを同時に最適化するブートストラップフレームワークは、モデルの精度と効率性を向上させられるか？
RQ4標準的なRNNと比較して、2C共有埋め込みを用いる際のモデルサイズ、学習速度、perplexityのトレードオフはいかなるものか？

主な発見

One-Billion-Wordデータセットにおいて、LightRNNはHSM（85）やB-RNN（68）と同等のperplexity 66を達成したが、モデルパラメータ数はわずか4100万にまで削減された。
HSMと比較してモデルサイズを40倍小さくし、B-RNNと比較して100倍小さくした。B-RNNが41億パラメータであるのに対し、LightRNNは4100万パラメータのモデルを実現した。
学習時間は2倍速化された：BillionWでLightRNNは70時間で学習完了したが、HSMは168時間かかった。単語再割り当てに要する時間は全体の2.36%にとどまった。
5-gramモデルとのアンサンブルを適用した結果、One-Billion-Wordデータセットにおけるテストperplexityは43にまで低下し、すべてのベースラインを上回った。
単語割り当てテーブルは自動的に意味的・構文的なクラスタ（例：場所名、時刻表現、URLなど）を発見しており、暗黙の構造学習が行われていることを示している。
ブートストラップ精錬を3〜4ラウンド実施した後、perplexityが安定化したため、最適化プロセスの収束性と頑健性が裏付けられた。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。