[论文解读] Should I Follow AI-based Advice? Measuring Appropriate Reliance in Human-AI Decision-Making
提出一个用于对AI建议的适当依赖(AR)的二维测量概念,并通过一个使用欺骗性酒店评论和XAI解释的顺序任务研究来演示。
Many important decisions in daily life are made with the help of advisors, e.g., decisions about medical treatments or financial investments. Whereas in the past, advice has often been received from human experts, friends, or family, advisors based on artificial intelligence (AI) have become more and more present nowadays. Typically, the advice generated by AI is judged by a human and either deemed reliable or rejected. However, recent work has shown that AI advice is not always beneficial, as humans have shown to be unable to ignore incorrect AI advice, essentially representing an over-reliance on AI. Therefore, the aspired goal should be to enable humans not to rely on AI advice blindly but rather to distinguish its quality and act upon it to make better decisions. Specifically, that means that humans should rely on the AI in the presence of correct advice and self-rely when confronted with incorrect advice, i.e., establish appropriate reliance (AR) on AI advice on a case-by-case basis. Current research lacks a metric for AR. This prevents a rigorous evaluation of factors impacting AR and hinders further development of human-AI decision-making. Therefore, based on the literature, we derive a measurement concept of AR. We propose to view AR as a two-dimensional construct that measures the ability to discriminate advice quality and behave accordingly. In this article, we derive the measurement concept, illustrate its application and outline potential future research.
研究动机与目标
- 将AI建议中的适当依赖(AR)定义为区分正确与错误的AI建议并据此作出决策的能力。
- 提出使用RAIR(相对正向AI依赖)和RSR(相对正向自我依赖)来衡量AR的二维度量。
- 在一项涉及AI建议和解释(XAI)的行为实验中,演示该测量概念,任务为酒店评论分类。
提出的方法
- 将AR推导为一个基于自动化与组织心理学文献的二维结构。
- 将RAIR和RSR定义为基于比率的指标,以捕捉区分与适应行为。
- 采用顺序决策设置:人类初始决策、AI建议,以及在建议输入后的再决策。
- 使用一个带有欺骗性质的酒店评论数据集,AI预测器采用支持向量机(精度86%)。
- 在XAI处理中过拟合使用LIME为基础的解释来检验它们对AR的影响。
- 通过将RAIR和RSR与随机基线进行比较并分析处理效应来评估AR。
实验结果
研究问题
- RQ1如何以严格的二维方式衡量AI建议上的AR?
- RQ2解释(XAI)对人类区分AI建议并调整决策能力的影响是什么?
- RQ3在正确/错误的AI建议存在的情况下,AR的各维度对积极AI依赖与积极自我依赖的反应是否不同?
- RQ4提出的AR框架能否区分人机决策中的低估依赖或过度依赖?
主要发现
- 在AI条件下,参与者表现出相对正向自我依赖RSR为0.72(±0.03),相对正向AI依赖RAIR为0.30(±0.03)。
- 在XAI条件下,RAIR增至0.39(±0.03),而RSR维持在0.72(±0.03)。
- XAI导致的RAIR增加在统计上具有显著性(t = -1.95,p = 0.05)。
- 解释可能降低不足依赖而不触发过度依赖,表明XAI对AR指标有细微作用。
- 这项研究证明了二维AR测量在分析设计选择(如XAI)如何影响区分与随后的依赖行为方面的实用性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。