QUICK REVIEW

[論文レビュー] StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi|arXiv (Cornell University)|Aug 13, 2019

Topic Modeling参考文献 35被引用数 100

ひとこと要約

StructBERTはBERTを拡張し、語彙構造および文構造の事前学習目的を追加して、GLUE・SNLI・SQuADのベンチマーク全体で性能を向上させる。

ABSTRACT

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

研究の動機と目的

より深い言語理解のために、事前学習時に基盤となる言語構造を活用する必要性を動機づける。
語順と文と文の関係を捉えるための二つの構造的事前学習目的を追加して、BERTを拡張する。
構造的な事前学習が多様なNLUタスク全体でより良い汎化性能をもたらすことを示す。

提案手法

BERTトランスフォーマーフレームワークから開始し、二つの新しい補助的事前学習タスクを追加する。語彙構造目的と文構造目的。
語彙構造目的: トークンの15%をマスクした後、アンマスクのトークン間で3語のシーケンス（トライグラム）をシャッフルし、元の位置を予測するようモデルを訓練する。
文構造目的: ペア内の文の順序をランダム化し、第二文が次か、前か、ランダムかを予測するようモデルを訓練し、文間構造を双方向にモデリングする。
これらの目的を元のマスクドLM目的と共に、単一の事前学習損失に統合する。
WordPieceトークン化、512のシーケンス長、標準のBERT様の入力表現とTransformerエンコーダを使用する。
English WikipediaとBookCorpusで大規模分散学習を行い事前学習を行い、その後タスク固有の微調整を行う。

実験結果

リサーチクエスチョン

RQ1事前学習中の明示的な語順および文間構造信号は、BERTを超える下流のNLU性能を改善するか？
RQ2語彙構造および文構造目的は、単一文タスクと文ペアタスクの改善にどの程度寄与するか？
RQ3StructBERTの派生モデルはGLUE、SNLI、SQuADのベンチマークで同時代のモデルとどう比較されるか？

主な発見

StructBERTLargeアンサンブルは、当時の最先端となるGLUE平均スコアを達成し、GLUEで平均89.0を記録した。
StructBERTはSNLIで91.7%の精度（単一モデル）を達成し、従来モデルを上回った。
StructBERTはSQuAD v1.1で93.0のF1を達成し、追加データ拡張なしで多くのベースラインを上回った。
アブレーションは、語彙構造と文構造の両方の目的がタスク全体で有益であることを示し、いずれかを除くと性能が低下する。特にCoLAの語彙目的、MNLI/SQuADの文目的で影響が大きい。
語彙構造目的は特に単一文タスク（例：CoLA）を改善し、文構造目的は文ペアタスク（MNLI、SNLI、QQP、SQuAD）を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。