QUICK REVIEW

[論文レビュー] A large-scale study of SVM-based methods for abstract screening in systematic reviews

Tanay Kumar Saha, Mourad Ouzzani|arXiv (Cornell University)|Jan 1, 2017

Explainable Artificial Intelligence (XAI)参考文献 33被引用数 3

ひとこと要約

本研究は、61件のシステマティックレビューと11の指標を用いた大規模分析を通じて、システマティックレビューの要約スクリーニングを自動化するためのSVMベースの手法を評価する。単一の優れた手法は特定されず、関連する研究を高信頼性で特定できるのは全キャシオンの15–20％にとどまることを明らかにした。また、性能の高い手法を統合するアンサンブル5つ星評価システムを提案し、関連性予測の精度を向上させた。

ABSTRACT

A major task in systematic reviews is abstract screening, i.e., excluding, often hundreds or thousand of, irrelevant citations returned from a database search based on titles and abstracts. Thus, a systematic review platform that can automate the abstract screening process is of huge importance. Several methods have been proposed for this task. However, it is very hard to clearly understand the applicability of these methods in a systematic review platform because of the following challenges:(1) the use of non-overlapping metrics for the evaluation of the proposed methods, (2) usage of features that are very hard to collect, (3) using a small set of reviews for the evaluation,and (4) no solid statistical testing or equivalence grouping of the methods. In this paper, we use feature representation that can be extracted per citation. We evaluate SVM based methods(commonly used) on a large set of reviews (61) and metrics (11) to provide equivalence grouping of methods based on a solid statistical test. Our analysis also includes a strong variability of the metrics using 500x2 cross validation. While some methods shine for different metrics and for different datasets, there is no single method that dominates the pack. Furthermore, we observe that in some cases relevant (included) citations can be found after screening only 15-20% of them via a certainty based sampling.A few included citations present outlying characteristics and can only be found after a very large number of screening steps.Finally, we present an ensemble algorithm for producing a 5-star rating of citations based on their relevance. Such algorithm combines the best methods from our evaluation and through its 5-star rating outputs a more easy-to-consume prediction.

研究の動機と目的

システマティックレビューにおけるSVMベースの要約スクリーニング手法の評価に標準化された手法が不足しているという問題に取り組む。
一般化可能性を確保するため、61件のシステマティックレビューからなる大規模で多様なデータセットを用いてSVM手法を評価する。
公平な手法間比較のため、厳密な統計的仮説検定と同等性グループ化を適用して、複数の指標における性能を比較する。
関連するキャシオンの再現率を維持しながら、人的作業を削減できる効率的なスクリーニング戦略を同定する。
解釈可能性と使いやすさを向上させるために、上位の手法を統合したアンサンブル5つ星評価システムを開発する。

提案手法

1つのキャシオンごとに抽出可能な特徴表現を用い、実世界のシステマティックレビュープラットフォームにおける実用的適用性を確保した。
手法の安定性とデータセットおよび分割ごとのばらつきを評価するため、500×2の交差検証を実施した。
多様な評価基準をカバーするため、11種類の異なる性能指標を用いて、手法の有効性を包括的に比較した。
単一指標の順位付けによる誤解を避けるために、手法の性能をグループ化する統計的同等性検定を実施した。
上位の個別手法の予測を統合するアンサンブルアルゴリズムを設計し、5つ星の関連性評価を生成した。
信頼性ベースのサンプリングを用いて、関連するキャシオンが高信頼性で発見できる早期のスクリーニング段階を同定した。

実験結果

リサーチクエスチョン

RQ1SVMベースの手法の中で、幅広いシステマティックレビューと評価指標において、どの手法が最も優れているのか？
RQ2信頼性ベースのサンプリングを用いることで、関連する研究の再現率を維持しながら、スクリーニング対象のキャシオン数を削減できるか？
RQ3厳密な統計的仮説検定を施した際、SVM手法間で一貫した性能グループが形成されるか？
RQ4SVM手法の性能特性は、異なるレビューと指標においてどのように変化するか？
RQ5上位の手法を統合するアンサンブルモデルは、関連性予測の解釈可能性と正確性を向上させられるか？

主な発見

どのSVMベースの手法も、すべての指標とデータセットにおいて一貫して他の手法を上回ることはなく、手法の性能は文脈依存であることが示された。
信頼性ベースのサンプリングを用いることで、キャシオンプールの15–20％のスクリーニングで関連するキャシオンを高信頼性で特定できる。
一部の関連キャシオンは特異な特徴を示し、広範なスクリーニングが行われるまで検出できないことが明らかになった。これは、早期終了によるリスクを示している。
アンサンブル5つ星評価システムは、上位の手法を統合し、より直感的かつ実行可能な関連性予測を実現した。
統計的同等性検定により、一部の手法間の性能差が有意でないことが判明し、微小な指標改善が意味を持つと仮定する考え方が疑問視された。
500×2の交差検証の使用により、指標の性能に顕著なばらつきが示された。これは、手法比較において堅牢な評価戦略の必要性を強調している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。