QUICK REVIEW

[論文レビュー] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

David Dalrymple, Joar Skalse|arXiv (Cornell University)|May 10, 2024

Adversarial Robustness in Machine Learning被引用数 12

ひとこと要約

Guaranteed Safe (GS) AI を世界モデル、形式的安全仕様、検証器から成る枠組みとして提案し、AIシステムに対する高信頼の安全保証を提供する。

ABSTRACT

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

研究の動機と目的

GS AI を定義し、定量的な高信頼保証を含む保証を主張する。
世界モデル、安全仕様、検証器が相互に作用して保証を提供する方法を説明する。
各 GS AI コンポーネントの構築に向けた潜在的アプローチを調査し、主要な課題を特定する。
GS AI が対処できる実践的な問題を例示し、実現可能性と利点を論じる。

提案手法

GS AI を正式に定義し、定量的な安全保証の基準を説明する。
三つの中核コンポーネント（world model、safety specification、verifier）とそれらの役割を概説する。
世界モデルは無モデルから formally verified 物理法則の抽象化までのスペクトルにわたる実現可能性を論じる。
伝統的な報酬ベースの定式化を超えた安全仕様と、その構築手法を説明する。
検証が世界モデルに対して形式的証明、確率的上限、または収束保証をもたらす方法を説明する。
既存の containment（ボクシング）手法との統合と、より広範な GS AI アジェンダに言及する。

実験結果

リサーチクエスチョン

RQ1AIシステムにおける高信頼の定量的安全保証とは何か。
RQ2world model、safety specification、verifier をどのように組み合わせて厳密な安全保証を生み出せるか。
RQ3堅牢な world model を構築する現実的な戦略は何か、解釈性と精度のトレードオフはどうなるか。
RQ4報酬ベースの目的を超えた安全仕様はどのような形をとり得るか、検証で如何に運用可能化できるか。
RQ5実世界の安全 Critical アプリケーションに対して、現実的でありながら GS AI のアプローチをどのようにスケールさせることができるか。

主な発見

GS AI は、安全性を world model、safety specification、verifier を通じて生み出される定量的な保証として位置づける。
世界モデルのアプローチのスペクトルは、無モデルから formally verified な物理法則の抽象化まで広がり、それぞれが異なる安全性への含意を持つ。
検証は形式的証明または確率的上限を提供でき、有限のリソースとモデルの不確実性に依存することがある。
世界モデルは手動で作成することも機械学習で構築することもでき、確率的プログラミングやベイズ法により理論の推論を扱いやすくする。
このアプローチは、非定常で複雑な環境において長期的な安全保証を達成するためにモデルベースの検証の必要性を主張する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。