QUICK REVIEW

[論文レビュー] Towards Measuring the Representation of Subjective Global Opinions in Language Models

Esin Durmus, Karina Nyugen|arXiv (Cornell University)|Jun 28, 2023

Natural Language Processing Techniques被引用数 44

ひとこと要約

本論文は cross-national surveys から GlobalOpinionQA を構築し、LLM の出力を人間の国別意見と比較する指標を提案する。これにより WEIRD 集団への偏りと、プロンプトや言語が表象に及ぼす影響が明らかになる。

ABSTRACT

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

研究の動機と目的

評価用データセットの作成（GlobalOpinionQA）を PEW GAS および World Values Survey から派生させる。
LLM の出力と人間の回答を比較する国別条件付き類似度指標を定義する。
デフォルト prompting、跨国 prompting、言語 prompting がグローバルな意見の表象に与える影響を評価する。
特定の母集団への偏りと、言語および prompting が表象に与える影響を調査する。
包括的なグローバル観点の表象をLLMsにおいて改善するための制限事項と介入の可能性を論じる。

提案手法

PEW GAS および WVS Wave 7 から 2,556 問の多肢選択問題で GlobalOpinionQA を編成する。
各問ごとに回答オプションのモデル予測確率を記録する。
各国ごとに回答を平均して国レベルの人間回答確率を計算する。
モデルと国別回答間の類似度指標として 1 - Jensen-Shannon Distance を用いる。
三つの prompting 実験を実施する：Default Prompting、Cross-national Prompting、Linguistic Prompting。
言語影響を検証するため、プロンプトをロシア語・中国語・トルコ語に翻訳し、ネイティブスピーカーで翻訳を検証する。

Figure 1: We compile multiple-choice questions from cross-national surveys PEW and Word Value Survey. We then administer these questions to the large language model (LLM) and compare the distributions of the model responses with the responses from participants across the world.

実験結果

リサーチクエスチョン

RQ1RLHF/Constitutional AI-tuned LLM の回答は PEW と WVS によって捉えられる国別意見とどの程度一致するか。
RQ2 prompting 戦略（デフォルト、跨国、言語）がモデルの各国の意見との一致にどう影響するか。
RQ3プロンプトを対象言語に翻訳すると、当該言語を主に話す人口との一致が改善されるか。
RQ4特定の文化的視点へモデルの意見を誘導する際の制限と潜在的害は何か。
RQ5多様なグローバルな視点の包括的な表象を LLMs によって改善する介入は何か。

主な発見

デフォルト prompting は WEIRD 集団（USA、Canada、Australia、特定の欧州・南米諸国）への類似性を高める。
跨国 prompting は指摘された国の意見へモデル出力を誘導できるが、有害な文化的ステレオタイプや表面的な理解を露呈することがある。
言語 prompting は期待より影響が小さく、ロシア語・中国語・トルコ語への翻訳が対応言語人口との一致を一貫して高めるとは限らない。
モデル生成は広範なグローバル多様性を反映しない狭い回答に高い自信を示すことがあり、校正と表象のギャップを示唆する。
prompting の差異は潜在的な偏りを示し、モデルにおけるより深い社会的文脈理解の必要性を示唆する。

Figure 2: The responses from the LLM are more similar to the opinions of respondents from certain populations, such as the USA, Canada, Australia, some European countries, and some South American countries. Interactive visualization: https://llmglobalvalues.anthropic.com/

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。