QUICK REVIEW

[論文レビュー] The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey

Yi-Chong Huang, Xiachong Feng|arXiv (Cornell University)|Apr 30, 2021

Topic Modeling参考文献 37被引用数 67

ひとこと要約

本調査は抽象要約における事実的不一致の問題をレビューし、評価指標（unsupervisedとweakly supervised）と事実の忠実性を向上させる最適化アプローチを詳述する。

ABSTRACT

Recently, various neural encoder-decoder models pioneered by Seq2Seq framework have been proposed to achieve the goal of generating more abstractive summaries by learning to map input text to output text. At a high level, such neural models can freely generate summaries without any constraint on the words or phrases used. Moreover, their format is closer to human-edited summaries and output is more readable and fluent. However, the neural model's abstraction ability is a double-edged sword. A commonly observed problem with the generated summaries is the distortion or fabrication of factual information in the article. This inconsistency between the original text and the summary has caused various concerns over its applicability, and the previous evaluation methods of text summarization are not suitable for this issue. In response to the above problems, the current research direction is predominantly divided into two categories, one is to design fact-aware evaluation metrics to select outputs without factual inconsistency errors, and the other is to develop new summarization systems towards factual consistency. In this survey, we focus on presenting a comprehensive review of these fact-specific evaluation methods and text summarization models.

研究の動機と目的

抽象要約における事実的不一致を動機づけ、定義する。
既存の事実的一貫性評価指標とメタ評価研究を調査する。
事実的忠実性のために要約システムを最適化するアプローチを要約する。
得られた教訓を強調し、今後の研究の方向性を提案する。

提案手法

事実的一貫性評価指標をunsupervisedとweakly supervisedのカテゴリに分類する。
unsupervised指標を triple-based、textual-entailment-based、QA-based、その他に細分する。
weakly supervised指標を文レベル、エンティティレベル、トークンレベルで要約する。
自動指標と人間の判断を関連付けるメタ評価を提示する。
事実エンコード、含意ベース、ポストエディティング、その他のアプローチによる事実的一貫性最適化手法を概説する。

実験結果

リサーチクエスチョン

RQ1抽象要約における事実的一貫性の評価指標にはどのようなものがあり、それらはどのように機能するのか？
RQ2自動的な事実性指標は人間の判断とどの程度相関するか（メタ評価）？
RQ3事実的一貫性のために要約モデルを最適化する戦略にはどのようなものがあり、それらはどの程度効果的か？
RQ4この分野の主な課題（ intrinsic と extrinsic の誤り）と今後の方向性は何か。

主な発見

幅広い事実的一貫性指標が存在するが、人間の判断との相関は依然として中程度である。
unsupervised指標にはトリプルベース、テキスト含意ベース、QAベース、その他の忠実性評価手法が含まれる。
弱教師あり指標は合成データに依存し、可能性を示すが、モデル出力との分布的類似性に依存する。
メタ評価は意味的類似性ベースの手法が良好に機能する可能性を示すが、人間の判断との相関は依然として0.5未満である。
事実的一貫性最適化手法は fact-encodeベース、テキスト含意ベース、ポストエディティングベース、ドメイン固有の技法を含む。
extrinsic誤りに対処し、段落レベルの評価とドメイン横断的な適用を開発する必要性が認識されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。