Skip to main content
QUICK REVIEW

[論文レビュー] Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Yuri Kuratov, Mikhail Arkhipov|arXiv (Cornell University)|May 17, 2019
Topic Modeling参考文献 16被引用数 257
ひとこと要約

この論文は、 multilingual BERT から monolingual Russian BERT を初期化することが、ロシア語 NLP タスクの性能を向上させ、訓練時間を短縮することを示しており、ロシア語特有の語彙と埋め込みが multilingual roots から導出されている。

ABSTRACT

The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. While language specific models show superior performance, multilingual models allow to perform a transfer from one language to another and solve tasks for different languages simultaneously. This work shows that transfer learning from a multilingual model to monolingual model results in significant growth of performance on such tasks as reading comprehension, paraphrase detection, and sentiment analysis. Furthermore, multilingual initialization of monolingual model substantially reduces training time. Pre-trained models for the Russian language are open sourced.

研究の動機と目的

  • Demonstrate that transfer from a multilingual BERT to a monolingual Russian model yields performance gains.
  • Show that multilingual initialization accelerates convergence and reduces training time for Russian models.
  • Develop RuBERT with a Russian-specific vocabulary and evaluate on Russian NLP tasks.
  • Provide open-source Russian pre-trained models and reproducible code within the DeepPavlov ecosystem.

提案手法

  • Use a 12-layer BERT-base Transformer encoder initialized from a multilingual BERT model (all parameters except word embeddings) for Russian.
  • Create a new Russian subword vocabulary with subword-nmt trained on Russian Wikipedia and news data.
  • Assemble new embeddings by merging intersections of multilingual and monolingual vocabularies; initialize new tokens with mean embeddings from overlapping tokens.
  • Train the monolingual Russian model on the same data used to build the monolingual vocabulary with batch size 256, learning rate 2e-5, Adam optimizer, and L2 regularization 0.01.
  • Evaluate on three tasks: paraphrase identification (ParaPhraser), sentiment analysis (RuSentiment), and question answering (SDSJ Task B).
  • Compare multilingual BERT, a monolingual Russian model trained from scratch, and the proposed RuBERT.

実験結果

リサーチクエスチョン

  • RQ1Can monolingual Russian models benefit from initializing with multilingual BERT weights?
  • RQ2Does multilingual initialization speed up convergence and reduce training time for Russian monolingual models?
  • RQ3How does RuBERT perform on Russian NLP tasks compared to multilingual BERT and models trained from scratch?
  • RQ4What impact does a language-specific Russian vocabulary have on model efficiency and performance?

主な発見

  • RuBERT outperforms the multilingual BERT on all evaluated Russian tasks (ParaPhraser and RuSentiment) and QA, with the best reported results: ParaPhraser F-1 87.73 and accuracy 84.99; RuSentiment F-1 84.60; SDSJ Task B QA EM 66.30.
  • Multilingual initialization yields faster convergence than random initialization, needing roughly 250k steps to reach a loss comparable to 800k steps with random init, saving about six days of compute on Tesla P100 x8.
  • The RuBERT model uses a Russian-specific vocabulary (~120k subtokens) that reduces mean sequence length by about 1.6x compared to multilingual vocabulary, enabling larger batches or longer inputs.
  • Training dynamics show that multilingual initialization improves convergence rate and training efficiency; averaging of new subtoken embeddings positively affects convergence.
  • Open-source Russian pre-trained models and code for reproducibility are available via the DeepPavlov library.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。