QUICK REVIEW

[論文レビュー] Towards best practices in AGI safety and governance: A survey of expert opinion

Jonas Schuett, Noemi Dreksler|arXiv (Cornell University)|May 11, 2023

Ethics and Social Impacts of AI被引用数 15

ひとこと要約

AGIラボ、学界、市民社会の51名の専門家を対象とした調査は、AGIラボが幅広い安全性とガバナンスの実践を採用すべきだという広範な合意を見いだし、特に事前展開リスク評価、危険な能力の評価、第三者モデル監査、安全制約、そしてレッドチーミングへの強い支持があることを示した。

ABSTRACT

A number of leading AI companies, including OpenAI, Google DeepMind, and Anthropic, have the stated goal of building artificial general intelligence (AGI) - AI systems that achieve or exceed human performance across a wide range of cognitive tasks. In pursuing this goal, they may develop and deploy AI systems that pose particularly significant risks. While they have already taken some measures to mitigate these risks, best practices have not yet emerged. To support the identification of best practices, we sent a survey to 92 leading experts from AGI labs, academia, and civil society and received 51 responses. Participants were asked how much they agreed with 50 statements about what AGI labs should do. Our main finding is that participants, on average, agreed with all of them. Many statements received extremely high levels of agreement. For example, 98% of respondents somewhat or strongly agreed that AGI labs should conduct pre-deployment risk assessments, dangerous capabilities evaluations, third-party model audits, safety restrictions on model usage, and red teaming. Ultimately, our list of statements may serve as a helpful foundation for efforts to develop best practices, standards, and regulations for AGI labs.

研究の動機と目的

AGIラボにおけるどの安全性とガバナンス実践が専門家の広範な支持を得ているかを特定する。
支持がセクター（AGIラボ、学界、市民社会）や性別によって異なるかを評価する。
AGIの安全性に関する標準・規制・ベストプラクティスの策定の基盤を提供する。
政策立案者や標準設定機関に、広く支持されているガバナンス措置を知らせる。

提案手法

92名の招待専門家に対してAGIラボ実践に関する50の声明を調査した。51件の回答（回答率55.4%）。
5点リッカート尺度（-2から2）に「I don't know」を加える；30項目は必須、20項目は任意。
統計検定：セクター別の全体的同意には Mann-Whitney U 検定；項目レベルの差には Chi-squared 検定；多重検定には Holm-Bonferroni 修正。
オープンサイエンス実践：事前登録、事前分析計画、OSFデータ/コード共有；匿名化された人口統計の報告。

Figure 1: Sample by sector and gender | The figure shows the sector of work and gender of the respondents. Respondents could choose more than one sector in which they work.

実験結果

リサーチクエスチョン

RQ1トップクラスの専門家は、AGIラボが実装すべき安全性とガバナンスの実践について、広く同意しているものは何か？
RQ2同意の程度はセクター（AGIラボ対学界対市民社会）または性別で異なるか？
RQ3調査対象の50項目を超えて、専門家はどのような追加の実践を提案しているか？
RQ4これらの発見は、AGIの安全性に関する政策・基準・規制の取り組みにどのように情報を提供できるか？

主な発見

50項目のうちほとんどが実施されるべきだという広範な合意がある；回答者の平均で各実践に対して85.2%が同意した。
事前展開リスク評価、危険な能力の評価、第三者モデル監査、安全制限、レッドチーミングといった主要な実践の実施に、回答者の98%が「いくらか同意」または「強く同意」した。
全項目の平均同意は -2から2の尺度で1.39で、一般的に同意寄りであることを示す。
AGIラボの回答者は学界や市民社会よりも全体的な同意が高かったが、項目レベルの差は有意ではなかった。
5項目は反対意見がなかった。これには危険な能力の評価、事前展開リスク評価、アライメント戦略の公開などが含まれる。
企業リスク管理や特定の連携実践（例：ラボ間の精査、他ラボへの通知）にはいくぶんの不確実性がある。
回答者は、調査リストを超える追加の50実践を示唆しており、より広範なガバナンス設計の余地を示している。

Figure 2: Percentages of responses for all statements | The figure shows the percentage of respondents choosing each answer option. At the end of each bar we show the number of people who answered each item. The items are ordered by the total number of respondents that “strongly” agreed. The full st

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。