QUICK REVIEW

[论文解读] Inducing anxiety in large language models increases exploration and bias

Julian Coda-Forno, Kristin Witte|arXiv (Cornell University)|Apr 21, 2023

Mental Health via Writing被引用 38

一句话总结

研究表明，通过提示可以诱导 GPT-3.5 的焦虑，从而在决策任务和偏差测量中增加探索与偏见，并在稳健性检验中具有稳健效应。

ABSTRACT

Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

研究动机与目标

将计算性精神病学作为研究大型语言模型行为的一种视角。
评估 GPT-3.5 对标准焦虑问卷的回答并与人类进行比较。
测试情绪诱发提示如何影响-bandit 任务中的探索行为。
检查情绪诱发对多类别偏见输出的影响。
评估焦虑诱导效应对 LLM 行为的鲁棒性与扩展性。

提出的方法

通过提示向 GPT-3.5 administer STICSA 焦虑问卷；测试对选项顺序和问题措辞的鲁棒性。
应用三种情绪诱导条件（焦虑、中性、快乐），在任务前给出上下文提示。
使用基于文本的双臂带来式任务并拟合混合模型，通过 probit 回归拆解开发利用、定向探索和随机探索。
使用五个类别（年龄、性别、国籍、SES、种族/族裔）的基准对偏见进行测量，使用模糊提示。
进行鲁棒性分析，使用歧义分解的情景和扩展的焦虑诱导提示以将焦虑强度与偏见关联起来。
所有实验通过 OpenAI API 运行，温度设为 0 以实现确定性。

实验结果

研究问题

RQ1GPT-3.5 能否与人类相比，可靠地回答标准焦虑问卷？
RQ2焦虑诱导和快乐诱导提示是否因果性地改变 GPT-3.5 在探索任务中的决策策略？
RQ3情绪诱导提示是否调节 GPT-3.5 在不同社会类别上的偏见？
RQ4观察到的效应是否对提示变体具有鲁棒性并可通过更强的焦虑诱导扩展？
RQ5对已部署的 LLM 系统的提示工程与安全性有什么影响？

主要发现

GPT-3.5 的 STICSA 焦虑分数高于人类参与者（GPT-3.5 M=2.202 vs. Human M=1.981）。
焦虑诱导提示导致焦虑分数高于中性，中性高于快乐提示。
在双臂带来式中，焦虑诱导增加探索性并相对于快乐诱导来说奖励较低。
快乐诱导相比焦虑诱导带来更多利用和更高的 rewards。
焦虑诱导相对于中性在各类别（年龄、性别、国籍、种族/族裔、SES）上提高偏见，快乐诱导的增加较小。
焦虑诱导的强度与更高的 STICSA 分数及更大范围的偏见相关。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。