QUICK REVIEW

[論文レビュー] MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Allen Nie, Yuhui Zhang|arXiv (Cornell University)|Oct 30, 2023

Topic Modeling被引用数 8

ひとこと要約

要約: 本論文は認知科学の物語からベンチマークを構築し、LLMsが人間の因果および道徳判断とどれだけ整合するかを測定。大規模モデルとRLHFが整合性を高める一方、人間と比較して内的傾向に体系的な差異を示すことを明らかにする。

ABSTRACT

Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigated. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses, we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenge datasets combined with insights from cognitive science can help us go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit tendencies and show to what extent these align with human intuitions.

研究の動機と目的

24件の認知科学論文から人間の因果および道徳判断に影響を与える要因を要約し、これらの要因で注釈付けされたストーリの統制データセットを作成する。
因果および道徳タスクに対する人間の判断との整合性を、幅広いLLMsで評価する。
Average Marginal Component Effect (AMCE)を用いてモデルの暗黙的傾向を分析し、LLMsと人間の差異がどこにあるかを特定する。
prompting手法、モデルサイズ、トレーニング手法（例: RLHF）が整合性と暗黙的傾向に与える影響を調査する。

提案手法

認知科学文献から因果・道徳判断ストーリーを収集し、因果構造、エージェントの自覚、規範タイプ、出来事の正常性、行動/不作為、時点、因果役割、手段/副作用、個人的力、避けられやすさ、善行性などの engineered factors で注釈を付ける。
因子注釈と専門家ラベルを付与したストーリに対する5150件の人間の反応データセットを作成する。
ゼロショットの因果および道徳判断に対して複数のLLMs（さまざまなサイズとトレーニング手法）を評価し、人間との一致度をAccuracy、AUC、MAE、クロスエントロピーで測定する。
各因子属性に対する暗黙的傾向を人間とモデルの双方で定量化するためにAMCE（ノンパラメトリックな平均値差）を適用する。
ペルソナ、Auto Prompt Engineerなどの prompting介入と prompting戦略を探索し、モデル傾向の変化を評価する。
大規模モデルとRLHFは全体的な整合性を高めるが、傾向はモデルや因子によって異なることを報告する。

Figure 1 : Two examples from our collected dataset. (a) shows a causal judgment story, and (b) shows a moral judgment story. In (a), a conjunction of two events was required, an abnormal event occurred, and Jane violated a prescriptive norm (scenario taken from Knobe & Fraser, 2008 ) . In (b), the t

実験結果

リサーチクエスチョン

RQ1LLMの因果・道徳判断は、慎重に構成されたシナリオ全体で人間の直感とどの程度一致するか。
RQ2モデルサイズとトレーニング手法（RLHF, 指示調整など）が人間の判断との整合性にどう影響するか。
RQ3AMCEによって明らかになる、人間の因果・道徳判断を形作る要因に対してLLMsはどんな暗黙的傾向を示すか。
RQ4 prompting戦略やペルソナベースの promptingはLLMsの判断を人間に整合した回答へシフトさせ得るか。

主な発見

大規模モデルとRLHFは集計指標で人間との整合性を向上させるが、タスクや因子によって一様ではない。
AMCE分析は、異常性、規範タイプ、必然性などの因子でLLMsと人間が異なる暗黙的傾向を示し、集計精度を超えた体系的差異を示唆する。
同程度の訓練を受けたモデル間でもモデルごとに偏りが存在し、サイズが大きいからといって整合性が単調に向上するとは限らない（非単調傾向）。
prompting手法は、特定の因子（例:異常性、自他の善行性）に対するモデル傾向を他の因子よりも大きく変えることがある（必然性などには影響が小さい）。
小さなモデルでは幻覚発生率が高く、サイズが大きいモデルは人間の判断と必ずしも一致しない別の傾向パターンを示す。

Figure 2 : Causal Implicit Tendencies . Each concentric circle marks the implicit tendency – the change in probability of responding “Yes” if the attribute had been present in the story. For example, if “Abnormal" is present in the story, humans will have a 25% higher probability to respond “Yes” on

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。