QUICK REVIEW

[论文解读] A practical guide and software for analysing pairwise comparison experiments

María Pérez‐Ortiz, Rafał Mantiuk|arXiv (Cornell University)|Dec 11, 2017

Image and Video Quality Assessment参考文献 36被引用 49

一句话总结

本文提出了一套全面的指南和开源的 MATLAB 工具箱，用于分析成对比较数据，重点针对图像质量评估。该方法引入了改进的缩放方法，包含异常值检测、置信区间估计以及有限距离先验，以提升准确性——尤其在观察者数量较少时表现更优，在模拟和实际应用中均展现出优于标准方法的性能。

ABSTRACT

Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.

研究动机与目标

解决在感知实验中分析成对比较数据的挑战，特别是图像质量评估中的问题。
克服直接评分方法的局限性，例如观察者之间和实验会话之间的缩放不一致。
提供一个稳健且易于访问的框架，将成对比较结果缩放为可解释的质量评分，并附带不确定性估计。
通过使用有限距离先验，在观察者数量较少时提升估计准确性，并处理实际问题如并列选择和不完整实验设计。

提出的方法

采用 Thurstone Case V 模型，将成对比较数据缩放为表示感知差异的一维质量尺度。
应用有限距离先验，以在观察者数量较少时降低估计误差，提升稳定性和准确性。
实施异常值检测，以识别并剔除不可靠的观察者或不一致的响应。
计算置信区间并进行统计检验，以评估质量评分差异的显著性和可靠性。
通过战略性地选择比较项（例如相邻条件）支持不完整实验设计，从而减轻数据收集负担。
通过等分法处理并列情况，但本文警告该方法会引入偏差，因此不建议在当前软件中使用。

实验结果

研究问题

RQ1如何将成对比较数据可靠地缩放为具有可解释不确定性的有意义质量评分？
RQ2在观察者数量较少时，有限距离先验对估计准确性有何影响？
RQ3不完整实验设计如何影响缩放结果的准确性和精确度？
RQ4在成对比较中允许‘无偏好’响应（即并列）会产生何种后果？对偏差和置信区间有何影响？
RQ5异常值检测与统计检验能否提升真实感知实验中成对比较分析的鲁棒性？

主要发现

引入有限距离先验可显著降低估计误差，尤其在观察者数量较少时效果更明显。
异常值检测通过识别并剔除不可靠的观察者响应，提高了缩放结果的可靠性。
聚焦于质量尺度中相邻条件的不完整设计，可在减少数据收集工作量的同时实现具有竞争力的性能。
允许‘无偏好’选项可缩小置信区间，但会引入显著的负偏差（即低估估计的质量差异）。
蒙特卡洛模拟显示，当真实 JOD 距离增大时，RMSE 和置信区间增长速度超过预期，表明对量纲范围敏感。
所提出的软件工具箱成功复现了先前计算机图形学研究的结果，并提供了一个稳健且可扩展的平台，适用于未来研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。