QUICK REVIEW

[论文解读] Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

David Powers|arXiv (Cornell University)|Oct 11, 2020

Rough Sets and Fuzzy Logic参考文献 24被引用 4,427

一句话总结

本文认为常用评估指标（精确度、召回率、F-measure、Rand 精度）存在偏差，并引入信息性、标记性及其与相关性和显著性的关系，以及对多分类评估的扩展。

ABSTRACT

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

研究动机与目标

突出常用评估指标（精确度、召回率、F-measure、Rand 精度）中的偏差。
将信息性和标记性引入作为带有偏差意识的评估指标。
解释信息性、标记性、相关性和显著性之间的联系。
演示这些概念如何与召回率和精确度相关。
概述将框架从二分问题扩展到多分类问题。

提出的方法

将信息性定义并阐明其作为有信息预测相对于随机预测的概率的含义。
引入标记性作为一个对偶度量，用以捕捉预测为“被标记/有标记”相对于随机的概率。
探索信息性、标记性、相关性和显著性之间的数学关系。
提供这些度量与传统指标如召回率和精确度之间关系的直观解释。
概述从二分类评估向多分类评估的扩展策略。

实验结果

研究问题

RQ1常用评估指标如何偏倚预测性能的评估？
RQ2信息性和标记性是什么，它们如何相对于随机程度量化预测质量？
RQ3信息性、标记性、相关性和显著性在数学上如何相关联？
RQ4如何将该框架从二分类扩展到多分类评估？

主要发现

信息性和标记性提供了相对于随机的预测质量的概率性评估。
信息性、标记性、相关性与显著性之间存在优雅的联系。
这些概念与召回率和精确度之间提供直观的关系。
该框架可以从二分情形扩展到多分类情形。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。