QUICK REVIEW

[论文解读] Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

Zifu Wang, Maxim Berman|arXiv (Cornell University)|Oct 30, 2023

Advanced Neural Network Applications被引用 9

一句话总结

本文引入细粒度的平均 IoU 变体（图像级、类别级和实例级）以及极端情况指标，以降低语义分割中的尺寸/标注偏差，并提供一个对 12 个数据集上 15 种模型的大规模基准测试。

ABSTRACT

Semantic segmentation datasets often exhibit two types of imbalance: extit{class imbalance}, where some classes appear more frequently than others and extit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards extit{majority classes} (e.g. overall pixel-wise accuracy) and extit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

研究动机与目标

通过提出细粒度的 mIoU 指标（I、C、K）及相应的极端情况变体，解决传统逐数据集 IoU 的偏差。
表明细粒度指标可以减少对大物体的偏倚，并为模型与数据集审计提供更丰富的统计洞见。
推动在 mIoU^D 的同时报告细粒度指标，以实现更稳健的方法比较，尤其适用于对安全至关重要的应用场景。

提出的方法

定义每图像和每类的 IoU 变体（mIoU^I 与 mIoU^C），以及一个实例变体（mIoU^K），以减少尺寸与类别偏差。
在 mIoU^K 公式中通过按实例大小成比例分配图像级假阳性来近似实例级假阳性。
提出极端情况指标 mIoU^{C^q} 及其聚合形式（mIoU^{C^{ar q}}、mIoU^{C^5}、mIoU^{C^1}）以捕捉困难场景。
对 12 个数据集中的 15 种网络进行从头训练的基准测试，以在新指标下比较架构与损失函数。
研究架构选择与损失函数如何与优化细粒度指标对齐（如 Jaccard 损失变体）。
提供关于聚合多尺度特征及使损失函数与细粒度评估指标对齐的最佳实践建议。

实验结果

研究问题

RQ1相较于按数据集的 mIoU，细粒度 IoU 指标（I、C、K 是否对大物体的偏倚更小？）
RQ2极端情况指标（mIoU^{C^q}）如何揭示在具有挑战性的图像或实例上分割模型的可靠性与鲁棒性？
RQ3哪些体系结构设计和损失函数最有效地优化细粒度指标？
RQ4当不可用实例标签时，mIoU^C 能否作为实例级评估的实际代理？
RQ5在跨多样数据集使用细粒度指标进行大规模基准测试时，会有哪些洞见？

主要发现

细粒度 mIoUs (I, C, K) 减少偏向大对象并提供比传统 mIoU^D 更丰富的统计信息。
mIoU^C 与实例级性能密切相关，当缺少实例标签时可代理实例级指标。
极端情况指标 (mIoU^{C^q}) 揭示许多模型的性能显著较低，凸显未被均值指标捕捉到的困难场景。
具备多尺度特征聚合且损失函数与 Jaccard 型目标对齐的架构在细粒度指标方面提升大于仅使用 CE 的情况。
基准测试显示没有单一模型在所有指标和数据集上占优，强调需要使用多指标进行全面评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。