QUICK REVIEW

[论文解读] Large Language Models as Superpositions of Cultural Perspectives

Grgur Kovač, Masataka Sawayama|arXiv (Cornell University)|Jul 15, 2023

Topic Modeling被引用 9

一句话总结

本论文将大型语言模型（LLMs）重新表述为多视角的叠加，显示上下文意外地改变表达的价值观和个性；引入视角可控性，并在三份心理学问卷中系统性地比较模型与诱导方法，覆盖16种模型。

ABSTRACT

Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions .

研究动机与目标

argue against the view of LLMs as stable individuals with fixed values or personalities.
Demonstrate the unexpected perspective shift effect where unrelated contextual changes alter expressed values.
Introduce and formalize the metaphor of LLMs as superpositions of perspectives.
Define and measure perspective controllability to assess how well a given perspective can be induced.
Compare multiple LLMs and induction methods across three psychology questionnaires.

提出的方法

Adopt three psychology questionnaires (PVQ for personal values, VSM for cultural values, IPIP for Big Five personality) to quantify LLM-expressed traits.
Expose LLMs to controlled contexts (textual prompts, system vs user messages, second vs third person perspectives) and record responses.
Induce target perspectives via four prompting methods and compute scores for each trait dimension.
Compute a controllability score by comparing induced target dimensions against non-induced dimensions across 50 permutations of answer orders.
Systematically compare 16 models across four perspective induction techniques and three questionnaires.
Use statistical analyses (ANOVA, Tukey HSD, Welch t-tests with Bonferroni corrections) to assess context effects and model controllability.

实验结果

研究问题

RQ1Do LLMs exhibit significant unexpected perspective shift effects when exposed to orthogonal contextual changes?
RQ2How controllable are different LLMs with respect to inducing target perspectives, across various induction methods and questionnaires?
RQ3Which induction method and which models yield the highest perspective controllability for PVQ, VSM, and IPIP?
RQ4How does RLHF finetuning influence perspective controllability over time or across model families?

主要发现

Unrelated contextual changes (conversations, formats, or wiki paragraphs) significantly alter expressed personal, cultural values, and personality traits.
The magnitude and direction of value shifts vary by context and by model, often larger than human-typical shifts observed in long-term development.
Perspective controllability varies across models and induction methods; some prompts and system/user message configurations yield higher controllability for certain questionnaires.
RLHF-finetuned GPT-4 and some Upstage LLaMa models show relatively higher controllability in several setups.
Different questionnaires (PVQ, VSM, IPIP) yield different best-performing induction methods and models, indicating model- and task-dependent controllability.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。