QUICK REVIEW

[논문 리뷰] Large language models predict human sensory judgments across six modalities

Raja Marjieh, Ilia Sucholutsky|arXiv (Cornell University)|2023. 02. 02.

Categorization, perception, and language인용 수 11

한 줄 요약

최신 LLM(GPT-3/3.5/4)이 여섯 가지 감각 모듈에 걸친 쌍별 감각 유사도 판단을 생성하며 인간 데이터와 유의하게 상관하고, 색상 원과 피치 나선과 같은 알려진 표현을 회복하며, 색 이름에서 언어 의존적 효과를 드러낸다.

ABSTRACT

Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.

연구 동기 및 목표

대형 언어 모델을 사용하여 언어로부터 세계에 관한 지각 정보를 얼마나 회복할 수 있는지 조사한다.
LLM 유도 유사도 판단이 여러 모달리티에서 인간의 지각 표현과 일치하는지 평가한다.
멀티모달 학습(텍스트 + 이미지) 여부가 모달리티별 예측력을 좌우하는지, 또는 순수한 언어만으로도 충분한지 검토한다.
LLM을 이용해 영어와 러시아어의 색 이름 지정을 시험하여 지각의 언어 간 효과를 탐구한다.

제안 방법

맞춤 프롬프트와 맥락 예시를 사용하여 GPT-3, GPT-3.5, GPT-4로부터 자극 쌍당 10개의 쌍별 유사도 평가를 얻는다.
여섯 모달리티에 걸쳐 피어슨 상관을 사용해 모델 유도 유사도 점수를 인간 데이터와 비교한다.
MDS를 통해 알려진 지각 구조의 등장을 분석하여 색상 원, 피치 나선, 자음 표현을 회복한다.
다국어 색 이름 짓기 작업(영어 및 러시아어)을 수행하여 지각 표현의 언어 의존성을 검사한다.
판단에 대한 모델이 생성한 설명을 제공하여 지각 개념(옥타브 관계, 발음 위치, 색 분광)과의 일치를 평가한다.]
table_headers: []
table_rows: []
research_questions similar to: ["Can LLMs yield similarity judgments that align with human perceptual representations across multiple modalities?","Do LLMs recover well-known perceptual structures such as the color wheel and pitch spiral from language?","Does multimodal training improve modality-specific performance beyond language alone?","Are color naming and perceptual representations influenced by prompt language, revealing language-dependent perception?","To what extent do LLMs replicate cross-linguistic variation in color naming observed in humans?"]
key_findings.0: "GPT-4 shows strongest alignment with human data across most modalities, with correlations such as pitch r=.92 and colors r=.89.",

Рис. 1: A. Schematic of the LLM-based and human similarity judgment elicitation paradigms. B. Correlations between models and human data across six perceptual modalities, namely, pitch, loudness, colors, consonants, taste, and timbre (Pearson $r$ ; 95% CIs).

실험 결과

연구 질문

RQ1다중 모달리티에 걸친 인간의 지각 표현과 일치하는 유사도 판단을 LLM이 제시할 수 있는가?
RQ2LLM이 색상 원 및 피치 나선과 같은 잘 알려진 지각 구조를 언어에서 회복하는가?
RQ3멀티모달 학습이 순수한 언어만으로는 달성하기 어려운 모달리티별 성능을 향상시키는가?
RQ4프롬프트 언어에 의해 색 이름 짓기와 지각 표현이 영향을 받으며 언어 의존적 지각을 드러내는가?
RQ5LLM이 인간에서 관찰되는 색 이름의 언어 간 변이성을 얼마나 잘 재현하는가?

주요 결과

GPT-4는 대부분의 모달리티에서 인간 데이터와 가장 강한 정렬성을 보이며, 피치 상관 r=.92, 색상 상관 r=.89와 같은 상관계수를 보인다.
GPT-3.5는 음량(r=.89) 등 다른 영역에서도 높은 상관을 보이며 전체 성능이 대개 상위 두 모델에 속한다.
피치(r=.90)와 자음(r=.46)에 대한 평가자 간 신뢰도(IRR)는 GPT-4의 성능이 일부 영역에서 인간 신뢰도에 근접함을 시사한다.
MDS 분석은 해석 가능한 지각 공간을 보여주며, 12-음정 구조의 피치 나선, 색상 원, 생산 기반 자음 표현을 포함한다.
GPT-4를 이용한 색 이름 짓기는 영어와 러시아어 간의 언어 간 차이를 재현하여 인간의 알려진 교차 언어 패턴과 일치한다.
GPT-4의 향상된 성능은 다중모달(이미지) 입력뿐만 아니라 보다 풍부한 텍스트 학습에 기인한다.

Рис. 2: A. Human and LLM similarity marginals and an example GPT-3 corresponding similarity matrix and its three-dimensional MDS solution for pitch. B. MDS solutions for vocal consonants and colors for GPT-4 similarity matrices. To illustrate the structure of the results, we highlighted consonants w

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.