[论文解读] CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes
CitiLink-Summ 引入一个欧洲葡萄牙语市政会议记录摘要语料库,含120份文档和2,880份手写讨论主题摘要,以及使用编码-解码模型与大语言模型的基线结果。
Municipal meeting minutes are formal records documenting the discussions and decisions of local government, yet their content is often lengthy, dense, and difficult for citizens to navigate. Automatic summarization can help address this challenge by producing concise summaries for each discussion subject. Despite its potential, research on summarizing discussion subjects in municipal meeting minutes remains largely unexplored, especially in low-resource languages, where the inherent complexity of these documents adds further challenges. A major bottleneck is the scarcity of datasets containing high-quality, manually crafted summaries, which limits the development and evaluation of effective summarization models for this domain. In this paper, we present CitiLink-Summ, a new corpus of European Portuguese municipal meeting minutes, comprising 100 documents and 2,322 manually hand-written summaries, each corresponding to a distinct discussion subject. Leveraging this dataset, we establish baseline results for automatic summarization in this domain, employing state-of-the-art generative models (e.g., BART, PRIMERA) as well as large language models (LLMs), evaluated with both lexical and semantic metrics such as ROUGE, BLEU, METEOR, and BERTScore. CitiLink-Summ provides the first benchmark for municipal-domain summarization in European Portuguese, offering a valuable resource for advancing NLP research on complex administrative texts.
研究动机与目标
- Address the scarcity of high-quality summaries for European Portuguese municipal minutes.
- Provide a domain-specific corpus of discussion-subject summaries (European Portuguese).
- Establish baselines using state-of-the-art models and LLMs for this domain.
- Offer publicly available resources (dataset, guidelines, code) to spur further research.
提出的方法
- Construct a new corpus from 120 municipal minutes across six municipalities (2021–2024).
- Manually segment minutes into discussion subjects and hand-write summaries by linguistics-trained annotators under expert supervision.
- Evaluate abstraction using Coverage and Density metrics.
- Fine-tune and benchmark multiple summarization models (BART, BART Large, PTT5, LED, PRIMERA) and large language models (Qwen2.5-1.5B, Gemini-2.5-flash) on the dataset.
- Use both lexical (ROUGE, BLEU, METEOR) and semantic (BERTScore) metrics for evaluation.
- Apply hierarchical chunking to handle limited context window of models.

实验结果
研究问题
- RQ1Can abstractive summarization effectively condense discussion-subject content from European Portuguese municipal minutes?
- RQ2What are the baseline performance levels of current models on this domain and language?
- RQ3How do different model families (encoder-decoder vs. large language models) compare on lexical and semantic metrics for this task?
- RQ4What can the CitiLink-Summ dataset reveal about levels of abstraction in summaries (coverage vs. density) for municipal texts?
主要发现
| Model | ROUGE-R1 | ROUGE-R2 | ROUGE-R-L | BLEU | METEOR | BERTSCORE | F1 | PREC | RECALL |
|---|---|---|---|---|---|---|---|---|---|
| BART | 63.52 | 49.22 | 58.20 | 36.06 | 55.15 | 83.87 | 84.59 | 83.37 | |
| BART Large | 68.96 | 54.78 | 63.64 | 42.43 | 61.65 | 86.28 | 86.57 | 86.15 | |
| PTT5 | 52.36 | 38.21 | 45.26 | 23.65 | 46.44 | 76.90 | 76.12 | 78.15 | |
| LED | 63.63 | 50.50 | 58.59 | 29.82 | 54.88 | 84.16 | 85.70 | 82.91 | |
| PRIMERA | 66.17 | 54.57 | 61.94 | 29.06 | 57.05 | 85.79 | 87.10 | 84.79 | |
| Qwen2.5-1.5B | 44.24 | 31.06 | 38.80 | 7.16 | 31.79 | 74.49 | 77.75 | 71.83 | |
| Gemini-2.5-flash | 64.16 | 48.94 | 55.97 | 28.40 | 54.34 | 83.09 | 82.99 | 83.19 |
- The CitiLink-Summ corpus contains 120 minutes and 2,880 manually written discussion-subject summaries.
- Summaries show medium-to-high coverage and low density, indicating abstraction beyond surface text reuse.
- Larger models (PRIMERA, BART Large, Gemini) achieve the highest scores across metrics.
- Fine-tuned models and open/open-source vs. closed models provide a usable baseline for European Portuguese municipal-domain summarization.
- Table 1 reports that BART Large achieves ROUGE of 68.96, BLEU+METEOR of 54.78, and BERTSCORE of 63.64, among others.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。