QUICK REVIEW

[論文レビュー] Text Summarization with Pretrained Encoders

Yang Liu, Mirella Lapata|arXiv (Cornell University)|Aug 22, 2019

Topic Modeling参考文献 34被引用数 150

ひとこと要約

この論文は、抽出型と要約型の単一文書要約の双方にBERTベースのエンコーダを適用することを調査し、文レベルの表現を得るための文書レベルの BertSum エンコーダを導入し、CNN/DailyMail、NYT、XSum データセットで最先端の結果を示す。

ABSTRACT

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at https://github.com/nlpyang/PreSumm

研究の動機と目的

事前学習済み言語モデル、特にBERTがテキスト要約をどのように改善できるかを評価する。
要約に適した文表現を得るための文書レベルエンコーダを開発する。
統合されたBertベースの枠組み内で抽出型と要約型の両方を探求する。

提案手法

BertSumを導入する。文の先頭に[cls]トークンを挿入したBERTに基づく文書レベルエンコーダで、文の表現を取得する。
BertSumの上に文間Transformer層を積み重ね、抽出選択のための文書レベルの特徴を捉える。
要約型要約では、事前学習済みのBertSumエンコーダとランダムに初期化されたTransformerデコーダを用いたエンコーダ–デコーダ方式を採用し、エンコーダとデコーダには別個のオプティマを用いる。
2段階のファインチューニングを提案する。まず抽出型要約でエンコーダを微調整し、次に要約型要約で微調整する。
ドロップアウト、ラベル平滑化、ビームサーチのトライグラムブロックを用いた標準的な学習スケジュールで訓練し、反復を抑制する。

実験結果

リサーチクエスチョン

RQ1事前学習済みのBertエンコーダは抽出型要約に適した文レベルの表現を生み出せるか？
RQ2事前学習済みエンコーダとランダムに初期化されたデコーダの間の生成の不一致を処理するために、Bertを要約型要約に効果的に適応させるにはどうすればよいか？
RQ32段階のファインチューニング（抽出型に続いて要約型）は要約品質を向上させるか？
RQ4Bertベースのモデルは、スタイルの異なる複数の単一文書要約データセットで最先端の結果を達成するか？

主な発見

Model	R1	R2	RL
Oracle	52.59	31.24	48.87
Lead-3	40.42	17.62	36.67
SummaRuNNer	39.60	16.20	35.30
Refresh	40.00	18.20	36.60
Latent	41.05	18.77	37.54
NeuSum	41.59	19.01	37.98
Sumo	41.00	18.40	37.20
Transformer Ext	40.90	18.02	37.17
BertSumExt	43.25	20.24	39.63
BertSumExt w/o interval embeddings	43.20	20.22	39.59
BertSumExt (large)	43.85	20.34	39.90
BertSumAbs	41.72	19.39	38.76
BertSumExtAbs	42.13	19.60	39.18

BertSumExt（文間層を用いる）はCNN/DailyMailでベースラインより抽出ROUGEスコアを改善する。
BertSumExtAbsとBertSumAbsは抽象ROUGEスコアで優れた性能を示し、いくつかのデータセットで従来の最先端結果に近づくか、上回る。
エンコーダとデコーダのデュアルオプティマファインチューニングスケジュールは安定した訓練をもたらし、好適な学習率設定を特定する（最良：エンコーダlr約2e-3、デコーダlr約0.1）。
抽出型→要約型の2段階ファインチューニングアプローチは、1段階の訓練より改善をもたらす。
XSumでは、データセットの高い要約性のため、要約型Bertベースモデルが多くのベースラインを上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。