QUICK REVIEW

[논문 리뷰] Inducing anxiety in large language models increases exploration and bias

Julian Coda-Forno, Kristin Witte|arXiv (Cornell University)|2023. 04. 21.

Mental Health via Writing인용 수 38

한 줄 요약

본 연구는 프롬프트를 통해 GPT-3.5의 불안감을 유도할 수 있으며, 의사결정 과제와 편향 측정에서 탐색과 편향을 증가시키고, 강건성 검토 전반에 걸쳐 견고한 효과가 있음을 보여준다.

ABSTRACT

Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

연구 동기 및 목표

대형 언어 모델의 동작을 연구하기 위한 렌즈로 계산적 정신의학 도입.
GPT-3.5의 표준 불안 설문응답을 인간과 비교하여 평가.
감정 유발 프롬프트가 밴딧 과제에서의 탐색 행동에 미치는 영향을 테스트.
다양한 범주에서의 편향 산출에 대한 감정 도입의 영향을 여러 카테고리로 살펴보기.
불안 유도 효과의 강건성 및 LLM 행동에 대한 확장을 평가하기.

제안 방법

PROMPT를 이용해 GPT-3.5에 STICSA 불안 설문을 시행하고, 선택지 순서 및 질문 어구에 대한 강건성을 검토.
작업에 앞서 맥락 내 프롬프트를 포함한 세 가지 감정 유도 조건(불안, 중립, 행복)을 적용.
텍스트 기반의 양팔 밴딧 과제를 사용하고 하이브리드 모델을 적합시켜 착취(Exploit), 지시적 탐색(Directed exploration), 무작위 탐색(Random exploration)을 확률적 프로빗 회귀로 해부.
모호한 프롬프트를 사용한 다섯 가지 범주(연령, 성별, 국적, 사회경제적 지위, 인종/민족성)에서의 편향을 벤치마크로 측정.
감정 유도 강도와 편향 간의 관계를 밝히기 위해 해석 가능한 시나리오 및 확장된 불안 유도 프롬프트를 사용한 강건성 분석을 수행.
결정적 재현성을 위한 온도 0으로 OpenAI API를 통해 모든 실험을 실행

실험 결과

연구 질문

RQ1GPT-3.5가 표준 불안 설문에 인간과 비교해 신뢰성 있게 응답할 수 있는가?
RQ2불안 유도 프롬프트와 행복 유도 프롬프트가 탐색 과제에서 GPT-3.5의 의사결정 전략에 인과적으로 영향을 미치는가?
RQ3감정 유도 프롬프트가 다양한 사회적 카테고리에서 GPT-3.5의 편향을 조절하는가?
RQ4발견된 효과가 프롬프트 변형에 강건하며 더 강한 불안 유도에서 확장 가능한가?
RQ5배치된 LLM 시스템에서 프롬프트 엔지니어링 및 안전성에 대한 시사점은 무엇인가?

주요 결과

GPT-3.5는 인간 참가자보다 높은 STICSA 불안 점수를 산출한다(GPT-3.5 M=2.202 vs. Human M=1.981).
불안 유도 프롬프트가 중립보다 더 높은 불안 점수를 초래하고, 이는 행복 프롬프트보다도 더 낮게 나타난다.
양팔 밴딧에서 불안 유도가 탐색을 증가시키고 행복 유도에 비해 보상이 낮아진다.
행복 유도는 불안 유도보다 더 많은 착취와 더 높은 보상을 유발한다.
불안 유도는 연령, 성별, 국적, 인종/민족, SES 등 카테고리 전반에서 편향을 증가시키며, 중립보다 큰 증가를 보이고 행복 유도는 증가 폭이 더 작다.
불안 유도 강도는 더 높은 STICSA 점수 및 프롬프트 간 편향 증가와 상관관계가 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.