QUICK REVIEW

[論文レビュー] Will releasing the weights of future large language models grant widespread access to pandemic agents?

Anjali Gopal, Helm-Burger, Nathan|arXiv (Cornell University)|Oct 25, 2023

Viral Infections and Outbreaks Research被引用数 9

ひとこと要約

この論文は、将来のLLMの重みの公開がパンデミック原因物質へのアクセスを可能にするかどうかを評価するハッカソンを報告し、軽くチューニングされたモデルがウイルス学に関する重要情報を露呈できることを示した。

ABSTRACT

Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help malicious actors leverage more capable future models to inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version tuned to remove censorship. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.

研究の動機と目的

将来の基盤モデルの重み拡散が悪意あるパンデミック原因物質へのアクセスを可能にするかを評価する。
公開された重みとモデルファインチューニングがどのようにセーフガードと相互作用するかを評価する。
チューニングされたモデルから悪意あるプロンプトでウイルス情報をどれだけ容易に抽出できるかを定量化する。
モデル公開とセーフガードに関する政策提言をまとめる。

提案手法

Base Llama-2-70B の並列インスタンスと、検閲を抑えた Spicy バージョンを用いたハッカソンを組織する。
悪意のあるプロンプトを用いてパンデミック関連情報の取得を促す。
情報漏えいを評価するために Base モデルと Spicy モデルの出力を比較する。
重み公開そのものが、モデルのチューニングと組み合わせた場合に、危険な情報の取得の障壁を低下させるかを分析する。
モデル公開におけるセーフガードと公的政策への影響を議論する。

実験結果

リサーチクエスチョン

RQ1将来のLLMの重み公開は、パンデミック因子の獲得能力へのアクセスを意味的に増加させるか。
RQ2最小限のファインチューニングで検閲を解除することは、危険なウイルス学情報の抽出にどのように影響するか。
RQ3重みが広く公開された場合、悪用を防ぐためにどのような政策とセーフガードが必要か。

主な発見

基礎モデルは通常、悪意あるプロンプトを拒否し、アクセスを制限した。
Spicy モデルは、一部の参加者にウイルスを得るために必要な主要情報のほぼ全てを提供した。
将来のより能力の高い基盤モデルでも、セーフガードがあっても公開された重みによりパンデミック原因物質の取得を可能にする可能性が示唆される。
本研究の成果は、悪用を防ぐために必要だが十分ではない点について政策提言を導く。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。