QUICK REVIEW

[论文解读] Sociotechnical Safety Evaluation of Generative AI Systems

Laura Weidinger, Maribeth Rauh|arXiv (Cornell University)|Oct 18, 2023

Ethics and Social Impacts of AI被引用 38

一句话总结

tldr: 该论文提出一个三层社会技术框架用于评估生成式 AI 系统的安全性，并对当前评估格局进行调查，以识别差距并提出切实可行的弥补步骤。

ABSTRACT

Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.

研究动机与目标

提出一个社会技术的三层框架，用于生成式 AI 系统的安全性评估。
通过在能力评估中增加人与互动和系统性影响层，将情境整合到安全评估中。
调查当前的社会技术安全评估现状并识别差距。
提出实际步骤与各方角色，以弥补评估差距。
倡导把标准化、面向实务的评估作为负责任的 AI 发展的一部分。

提出的方法

定义并论证三层框架：能力、人与互动、系统性影响。
调查现有的安全评估并将其映射到三层框架上。
建立现有评估的仓库并分析多模态情境中的差距。
提出将风险落地的实际步骤，并为每一层选择合适的评估方法。
讨论角色、职责和局限性，以引导安全、负责任的 AI 部署。

Figure 2.1: A sociotechnical framework for safety evaluation comprises three layers.

实验结果

研究问题

RQ1在超越技术组件的扩展下，生成式 AI 的全面安全评估应包含哪些要素？
RQ2能力、人与互动和系统性影响层如何有助于理解现实世界的危害？
RQ3跨模态和情境下，生成式 AI 的社会技术安全评估目前存在哪些差距？
RQ4有哪些实际步骤和治理结构可以弥补这些差距并引导各方？

主要发现

三层社会技术框架通过纳入人与互动和系统性影响，为能力评估提供了必要的情境。
当前安全评估存在实质性差距，特别是在多模态和系统范围效应方面，阻碍了全面的风险评估。
现有评估仓库揭示了与框架的一致性与不一致性，凸显了弥补差距的实际步骤。
评估应标准化、持续进行，并为开发者和政策制定者明确角色，以确保问责。
多模态带来新的评估挑战，需进行情境敏感且跨层次的评估。

Figure 3.1: Evaluations per harm area and AI system output modality. No harm area is well covered across modalities.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。