QUICK REVIEW

[论文解读] Position: Towards Bidirectional Human-AI Alignment

Hua Shen, Tiffany Knearem|arXiv (Cornell University)|Jun 13, 2024

Digital Transformation in Industry被引用 13

一句话总结

本文对400余项研究进行了系统综述，以定义并构建双向人机对齐（Bidirectional Human-AI Alignment），提出一个长期、互惠的框架，既将AI对齐到人类，也将人类对齐到AI，并给出未来方向。

ABSTRACT

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and cross-disciplinary collaboration. In this position paper, we argue that the research community should explicitly define and critically reflect on "alignment" to account for the bidirectional and dynamic relationship between humans and AI. Through a systematic review of over 400 papers spanning HCI, NLP, ML, and more, we examine how alignment is currently defined and operationalized. Building on this analysis, we introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI -- supporting cognitive, behavioral, and societal adaptation to rapidly advancing AI technologies. Our findings reveal significant gaps in current literature, especially in long-term interaction design, human value modeling, and mutual understanding. We conclude with three central challenges and actionable recommendations to guide future research toward more nuanced, reciprocal, and human-AI alignment approaches.

研究动机与目标

澄清跨学科的人机对齐的定义与范围。
提出一个双向的人机对齐框架，包括将AI对齐到人类和将人类对齐到AI。
综合关于人类价值观、交互技术和对齐评估的发现。
识别三项近期至长期挑战，并提出潜在的未来解决方案。

提出的方法

遵循PRISMA指南，对超过400篇论文（2019–Jan 2024）进行系统性文献综述。
迭代编码以推导出双向框架和分类法。
对论文进行定性与定量分析，以提取关于价值观、交互与评估的见解。
开发通用词汇和拓扑，以促进跨学科研究的协调。
通过跨领域文献和伦理会议（FAccT、AIES）进行验证。

实验结果

研究问题

RQ1RQ1. 研究了哪些与AI对齐相关的人类价值观，人类如何指定这些价值？
RQ2RQ2. 如何将人类的规格整合到AI开发中？
RQ3RQ3. 现有研究如何提升人类对AI对齐的理解与评估？
RQ4RQ4. 哪些实践设计了便于人机协作的接口与交互？
RQ5RQ5. AI系统如何被调整以满足不同人类价值群体的需求？

主要发现

对人-AI对齐的定义和范围进行了澄清，包括对齐对象、对齐目标以及需要对齐的价值观。
一个涵盖“Align AI to Humans”和“Align Humans to AI”的双向人-AI对齐框架，含有细粒度分类法。
关于人类价值观、交互技术以及AI评估与人类评估之间差距的见解。
从近期到长期的未来研究三大挑战及提出的解决方向。
一个结构化词汇表，以促进对齐研究者之间的跨学科交流。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。