QUICK REVIEW

[论文解读] Using novel data and ensemble models to improve automated labeling of Sustainable Development Goals

Dirk U. Wulff, Dominik Meier|arXiv (Cornell University)|Jan 25, 2023

Data-Driven Disease Surveillance被引用 8

一句话总结

本文比较七种SDG标注系统，揭示偏差和假阳性，并显示使用文本到SDG的集成模型在不同数据集上优于单一系统。

ABSTRACT

A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we present a systematic comparison of systems using a variety of text sources and show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g., are more sensitive to specific SDGs relative to others), and are susceptible to the type and amount of text analyzed. We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems. We conclude that researchers and policymakers should care about the choice of labeling system and that ensemble methods should be favored when drawing conclusions about the absolute and relative prevalence of work on the SDGs based on automated methods.

研究动机与目标

评估现有SDG标注系统在多种文本来源与多指标上的表现。
识别SDG特定偏差和假阳性倾向。
评估标注性能随文本类型与长度的变化。
证明集成模型是否能够克服单一系统的局限性。

提出的方法

在text2sdg中实现的七个SDG标注系统的评估（Aurora、Elsevier、SIRIS、Auckland、SDGO、SDSN、OSDG.ai）。
使用三组专家标注数据集（标题、摘要、新闻）以及额外的未标注/合成数据，以敏感性、特异性、准确性和F1等指标评估性能。
通过比较预测与观测的SDG频率、并在数据集间相关偏差特征，分析SDG特定偏差。
在大规模非SDG文本来源和不同长度的合成文本上测试对假阳性的鲁棒性。
开发并评估集成模型（随机森林与XGBoost），将系统预测和文档长度作为特征进行组合。
在专家标注数据和合成数据上训练集成模型以控制假阳性；并与单一系统进行比较。

实验结果

研究问题

RQ1现有SDG标注系统在不同文本类型上的灵敏度与特异性有何差异？
RQ2相对于专家判断，哪些SDG存在特定偏差？
RQ3集成建模能否缓解单一标注系统的局限性和偏差？
RQ4面对非SDG文本和不同文档长度时，集成模型是否仍保持性能？

主要发现

标注系统在灵敏度–特异性权衡及数据集表现方面差异显著。
系统呈现SDG特定偏差，相对于专家画像对某些SDG的高估或低估。
将标注系统应用于较长文本或非SDG文本时，常出现大量假阳性；保守性与假阳性倾向呈相关性。
集成模型在样本外平均准确性显著提升，且相对于单一系统减少偏差；将文档长度作为特征可提升性能。
集成方法在标题、摘要和新闻数据集上都能实现具有竞争力的准确性，有助于在宽松和谨慎倾向之间实现平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。