QUICK REVIEW

[논문 리뷰] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

David Dalrymple, Joar Skalse|arXiv (Cornell University)|2024. 05. 10.

Adversarial Robustness in Machine Learning인용 수 12

한 줄 요약

보장 안전(GS) AI를 세계 모델, 형식적 안전 명세, 검증기로 구성된 프레임워크로 제시하여 AI 시스템에 대한 고신뢰 안전 보장을 제공한다.

ABSTRACT

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

연구 동기 및 목표

보장 안전(GS) AI를 정의하고 고신뢰의 정량적 안전 보장을 주장한다.
세계 모델, 안전 명세, 그리고 검증기가 보장을 제공하기 위해 어떻게 상호 작용하는지 설명한다.
각 GS AI 구성요소를 구축하기 위한 잠재적 접근법을 조사하고 주요 도전과제를 식별한다.
GS AI가 해결할 수 있는 실용적 문제를 시연하고 실현 가능성과 이점을 논의한다.

제안 방법

GS AI를 형식적으로 정의하고 정량적 안전 보장의 기준을 설명한다.
세 가지 핵심 구성요소(세계 모델, 안전 명세, 검증기)와 그 역할을 개요한다.
모형이 없는 경우에서부터 형식적으로 검증된 물리 법칙의 추상화에 이르는 스펙트럼에 걸친 세계 모델의 가능한 실현을 논의한다.
전통적 보상 기반 구성 외의 안전 명세 및 그 구성 방법을 설명한다.
검증이 세계 모델에 상대된 형식적 증명, 확률적 한계, 수렴 보장을 어떻게 제공할 수 있는지 설명한다.
기존 격리 방법(containment) 또는 박스(boxing) 방법과의 통합 및 더 넓은 GS AI 의제에 대해 다룬다.

실험 결과

연구 질문

RQ1AI 시스템에 대한 고신뢰 정량적 안전 보장은 무엇으로 구성되는가?
RQ2세계 모델, 안전 명세, 검증기가 어떻게 결합되어 엄격한 안전 보장을 생성할 수 있는가?
RQ3강건한 세계 모델을 구축하기 위한 실현 가능한 전략은 무엇이며 해석 가능성과 정확도 간의 trade-off는 무엇인가?
RQ4보상 기반 목표를 넘는 안전 명세의 형태는 무엇이며 검증에서 이를 어떻게 운용화할 수 있는가?
RQ5GS AI 접근법이 실제 세계의 안전이 중요한 애플리케이션으로 확장되면서도 실용적일 수 있는가?

주요 결과

GS AI는 세계 모델, 안전 명세, 및 검증기를 통해 산출되는 정량적 보장으로 안전을 프레이밍한다.
세계 모델 접근법의 스펙트럼은 모형이 없는 경우에서 형식적으로 검증된 물리적 추상화에 이르는 범위를 가지며 각기 다른 안전 함의를 가진다.
검증은 형식적 증명이나 확률적 한계를 제공할 수 있으며 한정된 자원과 모델 불확실성에 의존할 수 있다.
세계 모델은 수동으로 설계되거나 기계학습될 수 있으며, 확률 프로그래밍 및 베이지안 방법은 이론에 대한 다루기 쉬운 추론을 가능하게 한다.
본 접근법은 비정상적이고 복합적인 환경에서 장기 안전 보장을 달성하기 위해 모형 기반 검증의 필요성을 주장한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.