QUICK REVIEW

[論文レビュー] Ethical and social risks of harm from Language Models

Laura Weidinger, John W. Mellor|arXiv (Cornell University)|Dec 8, 2021

Hate Speech and Cyberbullying Detection被引用数 71

ひとこと要約

本論文は、大規模言語モデルに起因する21の倫理的・社会的危害の分類を提示する。これらは差別、情報リスク、誤情報、悪用、ヒト-コンピュータ間の相互作用に関する害、そして自動化・環境への影響を含み、起源と緩和策について論じる。

ABSTRACT

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.

研究の動機と目的

責任あるイノベーションのために、大規模言語モデルのリスク像を構造化する。
複数の領域に跨る倫理的・社会的危害を特定し分類する。
害の起源を分析し、技術的・組織的・政策介入を含む緩和策を提案する。
包摂的で参加型の方法と分野横断的な協力の必要性を強調する。

提案手法

計算機科学、言語学、社会科学の学際的文献をレビューしてリスク領域を特定する。
21の具体的なリスクと出現メカニズムを含む、6領域の害の分類を提案する。
各リスクに対して、プロンプトと応答における現れを示す例示的で架空の例を提供する。
害の起点（例：訓練データ）と並行する緩和策（データ選別・データクレンジング、差分プライバシー、アクセス制御）を議論する。
緩和のための組織的責任と協力要件を概説する。
リスク評価枠組みとベンチマーキングに関する今後の研究の方向性を提示する。

実験結果

リサーチクエスチョン

RQ1大規模言語モデルに関連する総合的な倫理的・社会的害は何か？
RQ2これらの害はどのように分類でき、起源は何で、領域を超えて有効な緩和策は何か？
RQ3LMの開発・展開において責任あるイノベーションを支える枠組みとガバナンス実践は何か？
RQ4リスク評価・緩和・包摂的参加に関する今後の研究の方向性は何か？

主な発見

六つのリスク領域に整理された21の異なる害を特定する：差別・排除と毒性、情報リスク、誤情報による害、悪用、人間–コンピュータ相互作用の害、そして自動化・アクセス・環境への害。
害は訓練データや展開時における言語パターンの反映の仕方から生じると説明され、偏見の増幅やプライバシーリスクを含む。
リスクの起源は緩和策に影響を与えると主張し、たとえば出所でのデータ赤書/選別と、訓練時の差分プライバシーや製品レベルのアクセス制御など。
一つのリスクを緩和して別のリスクを悪化させてはならず、広範かつ学際的な協力が不可欠であると強調する。
包摂的な参加型手法と継続的なリスク評価を提唱し、進化するLMの能力に適応する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。