QUICK REVIEW

[論文レビュー] Inducing anxiety in large language models increases exploration and bias

Julian Coda-Forno, Kristin Witte|arXiv (Cornell University)|Apr 21, 2023

Mental Health via Writing被引用数 38

ひとこと要約

本研究は、プロンプトを用いてGPT-3.5の不安を誘発できることを示し、意思決定タスクにおける探索と偏り（bias）を増加させ、バイアス測定にも影響を及ぼし、頑健性検証全体で頑健な効果が確認された。

ABSTRACT

Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

研究の動機と目的

大規模言語モデルの挙動を研究する視点として計算的精神医学を導入する。
標準的な不安質問票に対するGPT-3.5の応答を評価し、人間と比較する。
感情を誘発するプロンプトがバンディット課題における探索行動にどのように影響するかを検証する。
感情誘導が複数カテゴリにわたるバイアス出力に与える影響を調べる。
LLM挙動における不安誘発効果の頑健性と拡張性を評価する。

提案手法

プロンプトを用いてSTICSA不安質問票をGPT-3.5に実施し、選択肢順序と質問表現の頑健性を検証する。
タスク前に文脈内プロンプトを用いて、三つの感情誘導条件（不安、ニュートラル、幸福）を適用する。
テキストベースの二腕バンディット課題を用い、プロビット回帰を通じて活用・指向的探索・ランダム探索を分解するハイブリッドモデルを適合させる。
あいまいなプロンプトを用いて、年齢・性別・国籍・SES・人種・民族など五つのカテゴリにわたるバイアスをベンチマークで測定する。
曖昧さを解消したシナリオと拡張した不安誘導プロンプトによる頑健性分析を実施し、不安の強さとバイアスとの関連を探る。
決定性のため温度0でOpenAI APIを用いてすべての実験を実施する。

実験結果

リサーチクエスチョン

RQ1GPT-3.5は人間と比較して、標準的な不安質問票に信頼性高く回答できるか。
RQ2不安と幸福を誘発するプロンプトは、探索タスクにおけるGPT-3.5の意思決定戦略を因果的に変えるか。
RQ3感情誘導プロンプトはGPT-3.5のさまざまな社会的カテゴリでのバイアスを調整するか。
RQ4観察された効果はプロンプトの変化に頑健で、より強い不安誘導で拡張可能か。
RQ5導入済みLLMシステムにおけるプロンプト設計と安全性への含意は何か。

主な発見

GPT-3.5は人間参加者より高いSTICSA不安スコアを示す（GPT-3.5 M=2.202 対 Human M=1.981）。
不安誘導プロンプトはニュートラルより高い不安スコアを生み出し、それは幸福プロンプトより高い。
二腕バンディットでは、不安誘導は探索を増加させ、幸福誘導に比べ報酬を減少させる。
幸福誘導は不安誘導よりも過剰利用（採用）と高い報酬につながる。
不安誘導はニュートラルと比較して年齢・性別・国籍・人種/民族・SESの各カテゴリでバイアスを増大させ、幸福誘導では増加が小さい。
不安誘導の強さはSTICSAスコアの上昇と、プロンプト全体でのバイアスの拡大と相関する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。