QUICK REVIEW

[论文解读] A large-scale study of SVM-based methods for abstract screening in systematic reviews

Tanay Kumar Saha, Mourad Ouzzani|arXiv (Cornell University)|Jan 1, 2017

Explainable Artificial Intelligence (XAI)参考文献 33被引用 3

一句话总结

本研究通过大规模分析61项系统综述及11项指标，评估基于SVM的方法在系统综述摘要筛选中的自动化应用。研究未发现单一主导方法，揭示仅需筛选15–20%的引文即可高把握度找到相关研究，并提出一种结合表现最佳方法的集成5星评分系统，以提升相关性预测效果。

ABSTRACT

A major task in systematic reviews is abstract screening, i.e., excluding, often hundreds or thousand of, irrelevant citations returned from a database search based on titles and abstracts. Thus, a systematic review platform that can automate the abstract screening process is of huge importance. Several methods have been proposed for this task. However, it is very hard to clearly understand the applicability of these methods in a systematic review platform because of the following challenges:(1) the use of non-overlapping metrics for the evaluation of the proposed methods, (2) usage of features that are very hard to collect, (3) using a small set of reviews for the evaluation,and (4) no solid statistical testing or equivalence grouping of the methods. In this paper, we use feature representation that can be extracted per citation. We evaluate SVM based methods(commonly used) on a large set of reviews (61) and metrics (11) to provide equivalence grouping of methods based on a solid statistical test. Our analysis also includes a strong variability of the metrics using 500x2 cross validation. While some methods shine for different metrics and for different datasets, there is no single method that dominates the pack. Furthermore, we observe that in some cases relevant (included) citations can be found after screening only 15-20% of them via a certainty based sampling.A few included citations present outlying characteristics and can only be found after a very large number of screening steps.Finally, we present an ensemble algorithm for producing a 5-star rating of citations based on their relevance. Such algorithm combines the best methods from our evaluation and through its 5-star rating outputs a more easy-to-consume prediction.

研究动机与目标

为解决基于SVM的系统综述摘要筛选方法在评估方面缺乏标准化的问题。
利用包含61项系统综述的大型多样化数据集评估SVM方法，以确保结果的泛化能力。
应用严格的统计检验与等效性分组，公平比较不同方法在各项指标上的表现。
识别出可减少人工工作量同时保持相关引文召回率的高效筛选策略。
开发一种集成5星评分系统，以提升相关性预测结果的可解释性与可用性。

提出的方法

采用每篇引文可提取的特征表示，确保在真实世界系统综述平台中的实际适用性。
应用500×2交叉验证，评估方法在不同数据集和划分下的稳定性与变异性。
评估11种不同的性能指标，全面比较方法在多样化评估标准下的有效性。
进行统计等效性检验，按性能对方法进行分组，避免因单一指标排名导致的误导性结论。
设计一种集成算法，结合表现最佳的单个方法的预测结果，生成5星相关性评分。
采用基于确定性的采样策略，识别出在高置信度下可能找到相关引文的早期筛选节点。

实验结果

研究问题

RQ1在广泛范围的系统综述与评估指标下，哪些基于SVM的方法表现最佳？
RQ2基于确定性的采样是否能减少需筛选的引文数量，同时保持对相关研究的高召回率？
RQ3在经过严格统计检验后，SVM方法之间是否存在一致的性能分组？
RQ4SVM方法的性能特征在不同综述与指标间如何变化？
RQ5结合表现最佳方法的集成模型是否能提升相关性预测的可解释性与准确性？

主要发现

无单一基于SVM的方法在所有指标与数据集上始终优于其他方法，表明方法表现具有上下文依赖性。
通过基于确定性的采样，仅筛选引文池的15–20%即可高把握度找到相关引文。
少数相关引文表现出异常特征，仅在经过大量筛选后才可被识别，凸显过早终止的风险。
集成5星评分系统成功整合了表现最佳的方法，生成更具直观性与可操作性的相关性预测结果。
统计等效性检验显示，部分方法之间的性能差异不具显著性，挑战了对微小指标改进具有实际意义的假设。
500×2交叉验证的使用揭示了指标性能存在高度变异性，强调在方法比较中需采用稳健的评估策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。