QUICK REVIEW

[论文解读] Safety Cases: How to Justify the Safety of Advanced AI Systems

Joshua Clymer, Nick Gabrieli|arXiv (Cornell University)|Mar 15, 2024

Adversarial Robustness in Machine Learning被引用 5

一句话总结

本文提出了用于构建 AI 安全案例的框架，概述了四类论证（能力不足、可控性、可信度、尊重权威），并描述了如何对部署的安全性进行结构化、评估，以及可能的认证。

ABSTRACT

As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety: total inability to cause a catastrophe, sufficiently strong control measures, trustworthiness despite capability to cause harm, and -- if AI systems become much more powerful -- deference to credible AI advisors. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.

研究动机与目标

将先进 AI 系统的安全案例概念引入为一个结构化的理由，以证明部署不太可能导致灾难性后果。
提出一个六步框架来组织安全论证并评估宏系统和子系统。
确定四类安全论证（能力不足、可控性、可信度、尊重权威）并提供具体模板和示例。
讨论安全与风险案例的整合，并为机构和监管机构提供政策建议。

提出的方法

在六步安全案例框架内定义 AI 宏系统和部署决策。
将安全论证分为四类并详细描述它们的结构和用例。
为每种论证类型提供具体模板和示例（包括危险能力评估）。
评估所提出安全论证的实用性、最大强度和可扩展性。
提出使用目标结构化符号（GSN）和潜在风险矩阵的整体安全案例流程。
为机构、审计机构和监管提供建议，包括持续监控以及硬性与软性标准。

Figure 1: the GSN diagram is the start of a holistic safety case 6 . The decomposition above would occur in step 2 (specifying unacceptable outcomes). ‘G’ labeled rectangles represent subclaims (i.e. goals). ‘S’ labeled parallelograms indicate justification strategies. For an example of a full safet

实验结果

研究问题

RQ1构建用于部署先进 AI 系统的结构化安全案例的要素是什么？
RQ2如何将安全案例分解为子系统并评估宏观层面的风险？
RQ3用于证明 AI 安全性的主要论证类别是什么，以及如何具体实现？
RQ4应如何评估和随着时间更新安全案例，包括风险案例的考虑？

主要发现

识别出四个构建块安全论证：能力不足、可控性、可信度和尊重权威。
提出一个端到端的六步框架，用于构建和评估 AI 安全案例。
强调危险能力评估、监控、外部化推理和测试床作为实际、可扩展的构建块。
整体安全案例可以用目标结构化符号（GSN）表示，并通过由第三方审计员审核的风险案例进行补充。

Figure 2: As AI systems become more powerful, developers will likely increasingly rely on arguments toward the right in the plot above.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。