QUICK REVIEW

[论文解读] Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Vivian Lai, Chacha Chen|arXiv (Cornell University)|Dec 21, 2021

Ethics and Social Impacts of AI被引用 75

一句话总结

本综述分析了关于人-AI 决策制定的以人为主体的实证研究，聚焦三个设计空间（任务、AI 帮助与评估指标），覆盖 100 余篇论文，提出框架与建议。

ABSTRACT

As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.

研究动机与目标

在高风险与日常情境中，促使建立一个连贯的人-AI 决策科学的需求。
综合整理超过 100 篇论文的实证研究设计，以绘制决策任务、AI 助手要素和评估指标的映射。
识别趋势、差距及可操作的建议，以提高研究的严谨性和泛化能力。
提出框架以考量设计空间并实现跨研究的泛化。

提出的方法

对 2018 至 2021 年在 AI 与 HCI 领域发表的以人为主体的实证研究进行系统性编码。
对每篇论文应用三码框架：决策任务、AI 模型/辅助要素，以及评估指标。
二轮编码以合并相似编码并将跨论文的相关主题分组。
开发摘要表以提供文献空间的快速概览。

实验结果

研究问题

RQ1研究人员在人-AI 决策研究中使用了哪些决策任务，以及领域与任务特性如何影响结果？
RQ2使用了哪些 AI 模型和 AI 助手要素，它们如何影响人类决策？
RQ3用于评估人类表现与体验的评估指标有哪些，研究之间存在哪些差距？
RQ4为促进该领域的严格、可泛化研究的共通框架，出现了哪些差距与建议？

主要发现

跨领域的决策任务差异很大，凸显了将发现推广到其他情境的挑战。
高风险领域（法律、医疗、金融、教育）较常见，而休闲和人为任务风险较低，常用于受控研究。
大多数研究聚焦于 AI-发现（AI-for-discovery）任务，而非 AI-模仿/仿真（AI-for-emulation），影响对真实世界决策的泛化。
许多研究依赖如 COMPAS 与 ICPSR 的数据集，导致在任务选择上潜在的数据集驱动偏误。
需要标准化框架来记录任务特征，例如风险、所需专业知识、主观性和真实标签源。
研究人员应报告决策者专业知识与 AI 识字水平，以提升结果的解释性和泛化性。
本文强调跨领域泛化的差距，并呼吁经验科学与 AI 发展相互塑造。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。