[论文解读] The hybrid confirmation tree: A robust strategy for hybrid intelligence
论文提出了混合确认树(hybrid confirmation tree,HCT),一种简单的人机决策规则:人和AI独立决策,若同意则结束决策;若意见不一致,第二个人类来否决,从而在六个真实世界数据集上实现比人类多数投票更高的准确率,同时降低人类投入。
Combining human and artificial intelligence (AI) is a potentially powerful approach to boost decision accuracy. However, few such approaches exist that effectively integrate both types of intelligence while maintaining human agency. Here, we introduce and evaluate the hybrid confirmation tree, a simple aggregation strategy that compares the independent decisions of both a human and AI, with disagreements triggering a second human tiebreaker. Through analytical derivations, we show that the hybrid confirmation tree can match and exceed the accuracy of a three-person human majority vote while requiring fewer human inputs, particularly when AI accuracy is comparable to or exceeds human accuracy. We analytically demonstrate that the hybrid confirmation tree's ability to achieve complementarity -- outperforming individual humans, AI, and the majority vote -- is maximized when human and AI accuracies are similar and their decisions are not overly correlated. Empirical reanalysis of six real-world datasets (covering skin cancer diagnosis, deepfake detection, geopolitical forecasting, and criminal rearrest) validates these findings, showing that the hybrid confirmation tree improves accuracy over the majority vote by up to 10 percentage points while reducing the cost of decision making by 28--44$\%$. Furthermore, the hybrid confirmation tree provides greater flexibility in navigating true and false positive trade-offs compared to fixed human-only heuristics like hierarchies and polyarchies. The hybrid confirmation tree emerges as a practical, efficient, and robust strategy for hybrid collective intelligence that maintains human agency.
研究动机与目标
- 推动健全的混合智能,保留人类自治,同时发挥AI优势。
- 开发简单、透明的聚合规则,结合人类与AI的判断。
- 分析准确性与决策相关性如何影响互补性表现。
- 在多样化的高风险数据集上进行方法的实证验证。
提出的方法
- 定义混合确认树:独立的人类与机器决策;一致即结束决策;若有分歧,第二个人类投出否决。
- 通过解析推导将HCT与三人多数投票以及两人层级/多层级结构进行比较。
- 建模人机相关性(kappa)对性能与互补性的影响。
- 在六个具有真实标签的数据集上评估HCT;测试机器概率输出的阈值调整。
- 评估相较多数投票在人工判断上的成本节省。
- 使用类似ROC的分析,展示在阈值设定下对真正率/假正率的灵活权衡。
实验结果
研究问题
- RQ1在何种条件下,混合确认树优于仅人类多数投票、仅机器决策,或两者都不及?
- RQ2人-人和人-机的相关性如何影响互补性能的潜力?
- RQ3混合确认树能否在降低人力成本的同时提供灵活的错误权衡(真阳性/假阳性)?
- RQ4机器预测的阈值调整如何在各领域影响准确性与成本?
主要发现
| Domain | Citation | Cases | Humans | Choices | Type of machine (Source) |
|---|---|---|---|---|---|
| Skin Cancer (Derm) | Brinker et al. (2019a, b) | 100 | 157 | 15,700 | CNN (own model) |
| Skin Cancer (Nonderm) | Brinker et al. (2019a, b) | 100 | 145 | 14,500 | CNN (own model) |
| Deepfakes | Groh et al. (2022) | 54 | 132 | 1,347 | CNN (Seferbekov 2021) |
| Criminal Rearrest | Angwin et al. (2016), Dressel and Farid (2018) | 1,000 | 400 | 20,000 | Logistic regression (own model) |
| Hybrid Forecasting Competition | Benjamin et al. (2023) | 52 | 111 | 1,055 | Time series regression (Benjamin et al. 2023) |
| ForecastBench | Karger et al. (2025) | 422 | 500 | 21,302 | LLM (Karger et al. 2025) |
- HCT在六个领域均优于人类多数投票,提升幅度至少2.3个百分点,最高达到10.4个百分点。
- 与多数投票相比,HCT将所需的人类判断减少了28%至44%。
- HCT在机器单独决策下的表现不如机器,但在准确性和相关性条件下可达到或超过仅人类基线。
- 当AI准确性接近或优于人类且决策相关性不太高时,互补性最高。
- 通过对机器预测进行阈值调整,HCT提供对真/假阳性权衡的灵活控制。
- 在各数据集中,HCT比多数投票具有更高的准确性并显著降低人力成本,尽管并非在所有情况下都超过单机机器。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。