[論文レビュー] The CL-SciSumm Shared Task 2018: Results and Key Insights
この論文は、CL-SciSumm 2018 共有タスクの公式結果を報告します。これはCL分野の中規模な科学文書要約ベンチマークで、60の RP–CP 注記セットと3つの要約タイプ、Task 1A/B および任意の Task 2 を評価対象としています。
This overview describes the official results of the CL-SciSumm Shared Task 2018 -- the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain. This year, the dataset comprised 60 annotated sets of citing and reference papers from the open access research papers in the CL domain. The Shared Task was organized as a part of the 41st Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Ann Arbor, USA in July 2018. We compare the participating systems in terms of two evaluation metrics. The annotated dataset and evaluation scripts can be accessed and used by the community from: \url{https://github.com/WING-NUS/scisumm-corpus}.
研究の動機と目的
- To evaluate automatic summarization of scientific papers in CL via citance-based linking and facet labeling.
- To compare systems using sentence overlap, ROUGE, and facet classification metrics.
- To generate structured, concise summaries from cited text spans and analyze cross-system performance.
- To expand resources and evaluation tools for scholarly summarization in the CL domain.
提案手法
- Participants built systems for Task 1A (cited text spans matching citances) and Task 1B (discourse facet classification) using a mix of lexical, statistical, and neural approaches.
- Task 2 (structured RP summary) was optional and evaluated against abstract, community, and human summaries using ROUGE measures.
- Evaluation used sentence-overlap F1 and ROUGE-2/ROUGE-SU4 for Task 1A and 1B, and multi-label precision/recall/F1 for Task 1B, with averaged scores over three annotation sets.
- The corpus comprises training data from ACL Anthology and a test set from ACL Anthology Network, with three independent annotations per RP/CP and per summary.
- Optional Task 2 summary generation was constrained to 250 words and evaluated against multiple gold standards.
実験結果
リサーチクエスチョン
- RQ1What is the best approach to map citances to their referenced text spans in papers (Task 1A)?
- RQ2How accurately can systems assign discourse facets to cited text spans (Task 1B)?
- RQ3Can a structured, short summary of a reference paper be generated from cited text spans (Task 2) and how does ROUGE compare to human references?
主な発見
- NUDT and CIST achieved top performance on Task 1A sentence overlap and ROUGE-based metrics for Task 1A.
- Klick Labs led the ROUGE–2 based evaluation in Task 1A across certain configurations.
- Task 1B results were led by CIST and NJUST across multiple runs, with Klick Labs as a notable runner-up.
- For Task 2, TALN-UPF performed best against abstracts and human summaries, while NLP-NITMZ excelled against community summaries.
- Overall, the results suggest lexical and similarity-based features remain strong, with potential gains from domain-specific embeddings in deep learning.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。