QUICK REVIEW

[論文レビュー] Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi|arXiv (Cornell University)|Sep 2, 2023

Text Readability and Simplification被引用数 58

ひとこと要約

本調査はLLMsにおける社会的バイアスと公正さの定義を統合し、バイアス評価指標とデータセットの分類法を導入し、前処理、学習中、処理間、後処理にわたるバイアス緩和技術を分類します。

ABSTRACT

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

研究の動機と目的

NLPおよびLLMsに対する社会的バイアスと公正性の概念を統合・形式化する。
データ構造とモデルアクセスに基づいてバイアス評価指標を整理する分類法を開発する。
LLMs向けの公開されているバイアス評価データセットを収集・分類する。
介入段階別にバイアス緩和技術を分類し、手法の統一表記を提供する。
公正なLLMsの今後の研究を指針とするオープンな課題と挑戦を特定する。

提案手法

NLPおよびLLMsに合わせたLLM概念と公正性の望ましい性質を形式化する。
3つの分類法を提案する：（i) バイアス評価指標（埋め込み、確率、生成テキスト）、（ii) バイアス評価データセット（反実仮想入力、プロンプト）、（iii) バイアス緩和技術（前処理、学習中、処理内、後処理）。
指標を比較し技法を形式化するための統一された数学的表記を提供する。
バイアス評価のための公開データセットを統合し公開する。
LLMsのバイアス低減のための未解決問題と今後の方向性について論じる。

実験結果

リサーチクエスチョン

RQ1LLMsとNLPタスクに関連する社会的バイアスと公正性の正確な側面は何か？
RQ2一貫した評価を可能にするために、データ構造とモデルアクセスによってバイアス評価指標をどのように整理できるか？
RQ3バイアス評価のために存在するデータセットは何か、どのように標準化または統合できるか？
RQ4介入段階全体でのバイアス緩和技術を最もよく表す分類法は何か？
RQ5LLMsの公正性を達成するための主要な未解決の課題と今後の方向性は何か？

主な発見

本論文は社会的バイアス、グループ公正と個人公正の形式的定義、NLPおよびLLMsに適用可能な被害の分類（表現的および分配的）を提供する。
埋め込み、確率、生成テキストにまたがるバイアス評価指標の統一分類を提供し、指標と評価データセットとの関連を明確にする。
反実仮想入力、プロンプトといった構造別にバイアス評価データセットを統合し、対象となる害と社会的集団を記録し、公開リポジトリを通じてアクセス可能にする。
介入段階（前処理、学習中、処理内、後処理）ごとに組織化された緩和技術の分類法を、粒度の細かいサブカテゴリと形式化とともに提示する。
この調査は、公正性の概念の頑健性、評価基準、NLPライフサイクル全体での緩和努力の拡大などのオープンな問題を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。