[論文レビュー] Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies
本調査は、人間-AIの意思決定に関する人間を対象とした実証研究を分析し、three design spaces (tasks, AI assistance, and evaluation metrics) に焦点を当てた100+ papersを横断して、フレームワークと提言を提案する。
As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.
研究の動機と目的
- Motivate the need for a coherent science of human-AI decision making in high-stakes and everyday contexts.
- Synthesize empirical study designs from over 100 papers to map decision tasks, AI assistance elements, and evaluation metrics.
- Identify trends, gaps, and actionable recommendations to improve study rigor and generalizability.
- Propose frameworks to account for design spaces and enable cross-study generalization.
提案手法
- Systematic coding of empirical human-subject studies from AI and HCI venues conducted between 2018 and 2021.
- Three-code framework applied to each paper: decision tasks, AI models/assistance elements, and evaluation metrics.
- Second-round coding to merge similar codes and group related themes across papers.
- Development of summary tables to provide quick overviews of the literature space.
実験結果
リサーチクエスチョン
- RQ1What decision tasks have researchers used in human-AI decision making studies and how do domain and task characteristics affect results?
- RQ2What AI models and AI-assistance elements are employed, and how do they influence human decision making?
- RQ3What evaluation metrics are used to assess human performance and experience, and what gaps exist across studies?
- RQ4What gaps and recommendations emerge to foster a common framework for rigorous, generalizable research in this field?
主な発見
- There is wide variety in decision tasks across domains, highlighting challenges in generalizing findings.
- High-stakes domains (law, medicine, finance, education) are common, while leisure and artificial tasks are lower-stakes and used for controlled studies.
- Most studies focus on AI-for-discovery tasks rather than AI-for-emulation, impacting generalizability to real-world decision making.
- Many studies rely on datasets like COMPAS and ICPSR, leading to potential dataset-driven biases in task selection.
- There is a need for standardized frameworks to document task characteristics such as risk, required expertise, subjectivity, and groundtruth source.
- Researchers should report decision-maker expertise and AI-literacy to improve interpretation and generalizability of results.
- The paper highlights gaps in cross-domain generalization and calls for mutual shaping of empirical science and AI development.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。