QUICK REVIEW

[論文レビュー] QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Ming Zhong, Da Yin|arXiv (Cornell University)|Apr 13, 2021

Topic Modeling参考文献 52被引用数 27

ひとこと要約

QMSum は、複数ドメインに跨るクエリベースの会議要約の新しいベンチマークを導入し、広範なベースラインとマルチドメイン評価を備えた locate-then-summarize の二段階手法を提案します。

ABSTRACT

Meetings are a key component of human collaboration. As increasing numbers of meetings are recorded and transcribed, meeting summaries have become essential to remind those who may or may not have attended the meetings about the key decisions made and the tasks to be completed. However, it is hard to create a single short summary that covers all the content of a long meeting involving multiple people and topics. In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and we introduce QMSum, a new benchmark for this task. QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains. Besides, we investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task. Experimental results and manual analysis reveal that QMSum presents significant challenges in long meeting summarization for future research. Dataset is available at \url{https://github.com/Yale-LILY/QMSum}.

研究の動機と目的

クエリを用いたマルチドメイン会議要約のタスクを定義し、柔軟でユーザーに合わせた会議ダイジェストが必要であることを動機づける。
複数ドメインにまたがるクエリ、関連スパン、要約を含む高品質で階層的に注釈されたデータセット（QMSum）を構築する。
locate-then-summarize パイプラインを提案し、ロケーションステップの課題を定量化するための強力なベースラインを確立する。

提案手法

タスクを p(y|Q, X) という形式で定式化する。ここで Q はクエリ、X は会議の書き起こし。
クエリに関連するスパンを識別する二段階の Locator（Pointer Network または階層的ランキングを使用）を実装し、抽出されたスパンから抽象的要約を生成する Summarizer を続ける。
ロケーションステップの影響を評価するため、Gold スパンの有無で複数の要約モデル（PGNet, BART, HMNet）を探索する。
3つのドメイン（Product, Academic, Committee）で構築・評価し、ドメイン横断の一般化を分析する。
マルチドメイン設定でのロバスト性を検証するため、ドメイン横断の事前学習と評価を提供する。

実験結果

リサーチクエスチョン

RQ1長くて多ドメインの会議書き起こしに対して locate-then-summarize アプローチはクエリに焦点を当てた要約を効果的に生成できるのか。
RQ2クロスドメイン学習は Product, Academic, Committee ドメイン間で会議要約モデルの一般化性にどのように影響するのか。
RQ3クエリベースの会議要約における主な課題（例：事実性、関連性）は何で、クエリタイプによってどのように異なるのか。

主な発見

Model	R-1	R-2	R-L
HMNet †	36.06	11.36	31.27
HMNet ∗	32.29	8.67	28.17
BART †	32.18	8.48	28.56
PGNet †	31.52	8.69	27.63
BART ∗	31.74	8.53	28.21
PGNet ∗	31.37	8.47	27.08
All	32.18	8.48	28.56
Random	12.03	1.32	11.76
Ext. Oracle	42.84	16.86	39.20

QMSum には、3 ドメインにまたがる 232 会議のうち 1,808 のクエリ–要約ペアが含まれている。
提案された Locator による関連スパンの特定は、要約モデルの入力品質を大幅に改善し、元のテキストの 1/3 を抽出した場合に ROUGE-L のリコールが最大で 84.04 に達する。
Locator 入力を用いたニューラル要約器はベースラインを上回り、単一ドメイン設定では HMNet が最良の結果を示した。
マルチドメイン訓練はドメイン横断の安定した性能をもたらし、一部のドメイン（例：Academic）では単一ドメインモデルを上回ることがある（ROUGE-2 および ROUGE-L）。
人間による評価では、生成要約には事実性と関連性の大きなギャップがあることが明らかで、サンプルの 74% に事実的不整合が、31% にはクエリに関連しない内容が含まれていた。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。