QUICK REVIEW

[论文解读] Delphi: Towards Machine Ethics and Norms

Liwei Jiang, Jena D. Hwang|arXiv (Cornell University)|Oct 14, 2021

Ethics and Social Impacts of AI参考文献 51被引用 70

一句话总结

Delphi 提出了一种机器伦理框架，利用包含170万条人类标注的道德判断的常识规范库，通过微调深度学习模型实现伦理推理。该模型在人类验证的道德决策中达到92.1%的准确率，显著优于 GPT-3 的52.3%零样本性能，表明微调后的道德知识对实现伦理人工智能至关重要。

ABSTRACT

What would it take to teach a machine to behave ethically? While broad ethical rules may seem straightforward to state (thou shalt not kill), applying such rules to real-world situations is far more complex. For example, while helping a is generally a good thing to do, helping a friend spread fake news is not. We identify four underlying challenges towards machine ethics and norms: (1) an understanding of moral precepts and social norms; (2) the ability to perceive real-world situations visually or by reading natural language descriptions; (3) commonsense reasoning to anticipate the outcome of alternative actions in different contexts; (4) most importantly, the ability to make ethical judgments given the interplay between competing values and their grounding in different contexts (e.g., the right to freedom of expression vs. preventing the spread of fake news). Our paper begins to address these questions within the deep learning paradigm. Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning, with up to 92.1% accuracy vetted by humans. This is in stark contrast to the zero-shot performance of GPT-3 of 52.3%, which suggests that massive scale alone does not endow pre-trained neural language models with human values. Thus, we present Commonsense Norm Bank, a moral textbook customized for machines, which compiles 1.7M examples of people's ethical judgments on a broad spectrum of everyday situations. In addition to the new resources and baseline performances for future research, our study provides new insights that lead to several important open research questions: differentiating between universal human values and personal values, modeling different moral frameworks, and explainable, consistent approaches to machine ethics.

研究动机与目标

解决机器在涉及相互竞争价值的复杂现实情境中应用伦理规范的挑战。
克服大型语言模型（如 GPT-3）虽规模巨大但缺乏 grounded 人类道德价值观的局限。
构建一个结构化、机器可读的道德知识库，以支持一致且可解释的伦理决策。
探究常识推理与情境理解如何整合进伦理判断系统。
为未来关于道德框架、价值区分和可解释人工智能伦理的研究提供基础。

提出的方法

构建一个包含170万条人类标注的日常情境道德判断的常识规范库，作为道德训练语料。
在该规范库上对深度学习模型 Delphi 进行训练，通过在多样化道德困境上的监督微调学习伦理推理。
整合视觉与自然语言感知能力，以实现对现实情境的场景理解。
运用常识推理，预测在不同社会和情境条件下行为的后果。
采用人类验证的基准评估伦理判断的准确性，确保与人类道德直觉一致。
设计模型以在自由表达与伤害预防等相互竞争的价值之间实现精细平衡。

实验结果

研究问题

RQ1机器能否通过学习大规模、人类标注的道德决策数据集，实现一致的伦理判断？
RQ2在精心筛选的道德知识库上进行微调，如何在超越大型语言模型零样本能力的基础上提升伦理推理能力？
RQ3在复杂现实情境中，情境理解和常识推理在多大程度上能增强伦理决策能力？
RQ4在机器学习系统中，如何平衡相互竞争的道德价值（如言论自由与伤害预防）？
RQ5在机器伦理背景下，普遍的人类价值观与个人或文化价值观有何区别？

主要发现

Delphi 在人类验证的伦理判断中达到92.1%的准确率，显著优于 GPT-3 的52.3%零样本性能。
包含170万条示例的常识规范库，为机器伦理提供了强大且多样的训练资源。
在人类标注的道德判断上进行微调，可显著提升预训练语言模型的伦理推理能力。
该模型在处理相互竞争的价值（如平衡言论自由与防止虚假信息）方面表现出强大能力。
结果表明，仅靠大规模模型本身不足以实现伦理行为——结构化的道德知识至关重要。
本研究揭示了在价值区分、道德框架建模和可解释性方面存在的关键开放问题。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。