QUICK REVIEW

[論文レビュー] Large language models predict human sensory judgments across six modalities

Raja Marjieh, Ilia Sucholutsky|arXiv (Cornell University)|Feb 2, 2023

Categorization, perception, and language被引用数 11

ひとこと要約

最先端のLLMs（GPT-3/3.5/4）は、6つのモダリティにわたる対の感覚的類似性判断を生み出し、人間データと有意に相関し、カラー・ホイールやピッチスパイラスのような既知の表現を回復し、色名付けにおける言語依存効果を明らかにする。

ABSTRACT

Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.

研究の動機と目的

世界についての知覚情報を言語からどれくらい回復できるかを大規模言語モデルで調査する。
LLM由来の類似性判断が、複数のモダリティにわたる人間の知覚表現と一致するかを評価する。
マルチモーダル訓練（テキスト＋画像）と単なる言語のいずれが、モダリティ特異的な予測力を高めるかを分析する。
LLMsを用いて英語とロシア語で色名付けを行い、知覚表現における言語間の効果を検討する。

提案手法

GPT-3、GPT-3.5、GPT-4を用いて、適切なプロンプトとインコンテキスト例を使い、刺激対ごとに10個の対比較類似性評価を引き出す。
6つのモダリティ全体で、モデル由来の類似度スコアを人間データとピアソン相関で比較する。
MDSを用いて既知の知覚構造の出現を分析し、カラー・ホイール、ピッチスパイラル、子音表現を回復させる。
言語依存性を検証するため、英語とロシア語の多言語カラー名付け課題を実施する。
判断に対するモデル生成の説明を提供し、知覚概念（オクターブ関係、発音位置、色スペクトル）との整合性を評価する。

Рис. 1: A. Schematic of the LLM-based and human similarity judgment elicitation paradigms. B. Correlations between models and human data across six perceptual modalities, namely, pitch, loudness, colors, consonants, taste, and timbre (Pearson $r$ ; 95% CIs).

実験結果

リサーチクエスチョン

RQ1LLMsは、複数のモダリティにわたる人間の知覚表現と一致する類似性判断を生み出せるか。
RQ2LLMsは、カラー・ホイールやピッチスパイラルといった既知の知覚構造を言語から回復できるか。
RQ3マルチモーダル訓練は、言語のみよりもモダリティ特異的な性能を向上させるか。
RQ4プロンプトの言語によって色名付けと知覚表現が影響を受け、言語依存の知覚が現れるか。
RQ5人間で観察されるクロス言語的な色名付けの差異を、LLMsは再現できる程度まで再現できるか。

主な発見

GPT-4は、多くのモダリティで人間データとの最も強い整合を示し、例えばピッチでr=.92、カラーでr=.89といった相関を得た。
GPT-3.5は、音量（r=.89）を含む高い相関を示し、全体的な性能はしばしば上位2モデルに入る。
ピッチ（r=.90）と子音（r=.46）のIRRは、GPT-4の性能が一部の領域で人間の信頼性に近づくことを示唆する。
MDS分析は解釈可能な知覚空間を明らかにし、12半音構造のピッチスパイラル、カラー・ホイール、発話に基づく子音表現を示す。
カラー名付けにおけるGPT-4は、英語とロシア語間のクロス言語差を再現し、人間のクロス言語パターンと一致する。
GPT-4の性能向上は、 multimodal（画像）入力だけでなく、より豊富なテキスト訓練に起因すると考えられる。

Рис. 2: A. Human and LLM similarity marginals and an example GPT-3 corresponding similarity matrix and its three-dimensional MDS solution for pitch. B. MDS solutions for vocal consonants and colors for GPT-4 similarity matrices. To illustrate the structure of the results, we highlighted consonants w

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。