QUICK REVIEW

[논문 리뷰] The Representational Alignment Hypothesis: Evidence for and Consequences of Invariant Semantic Structure Across Embedding Modalities

Akhil Ramidi, Kevin Scharp|arXiv (Cornell University)|2026. 02. 18.

Embodied and Extended Cognition인용 수 0

한 줄 요약

이 논문은 독립적으로 학습된 다중 모달 임베딩이 불변의 시맨틱 기하를 공유한다는 증거를 검토하고, 플라톤적 읽기를 거부하고 메타시맨틱스에 뿌리를 두는 아이디어를 포함한 철학적 함의를 논의한다. 또한 간단한 선형 매핑이 모달리티 간 임베딩 공간을 정렬할 수 있음을 강조한다.

ABSTRACT

There is growing evidence that independently trained AI systems come to represent the world in the same way. In other words, independently trained embeddings from text, vision, audio, and neural signals share an underlying geometry. We call this the Representational Alignment Hypothesis (RAH) and investigate evidence for and consequences of this claim. The evidence is of two kinds: (i) internal structure comparison techniques, such as representational similarity analysis and topological data analysis, reveal matching relational patterns across modalities without explicit mapping; and (ii) methods based on cross-modal embedding alignment, which learn mappings between representation spaces, show that simple linear transformations can bring different embedding spaces into close correspondence, suggesting near-isomorphism. Taken together, the evidence suggests that, even after controlling for trivial commonalities inherent in standard data preprocessing and embedding procedures, a robust structural correspondence persists, hinting at an underlying organizational principle. Some have argued that this result shows that the shared structure is getting at a fundamental, Platonic level of reality. We argue that this conclusion is unjustified. Moreover, we aim to give the idea an alternative philosophical home, rooted in contemporary metasemantics (i.e., theories of what makes a representation and what makes something meaningful) and responses to the symbol grounding problem. We conclude by considering the scope of the RAH and proposing new ways of distinguishing semantic structures that are genuinely invariant from those that inevitably arise due to the fact that all our data is generated under human-specific conditions on Earth.

연구 동기 및 목표

독립적으로 학습된 임베딩(텍스트, 비전, 오디오, 신경 신호) 간에 불변의 모달리티 독립적 시맨틱 구조가 존재하는지 평가한다.
내부 구조 분석(RSA, 위상학)과 교차 모달 정렬에서 공유되는 기하학적 구조를 맥락화된 매핑 없이 검토한다.
상징 기초화와 메타시맨틱스에 대한 함의를 평가하고 플라톤적 표현 가설에 반대한다.
보편적 불변성의 도전에 대응하고 향후 연구 방향을 제시한다.

제안 방법

각 모달리티 내의 관계 패턴을 명시적 교차 모달 매핑 없이 비교하기 위한 Representational Similarity Analysis(RSA), 상호정보, 토폴로지 데이터 분석을 논의한다.
전역 기하학적 및 토폴로지적 특징을 조사하여 모달리티 간 공유 공간 형상을 평가한다.
선형 또는 준선형 매핑(예: Procrustes, CSLS, 비지도/약지도 학습 접근법)을 통해 공간을 정렬하는 변환 기반 방법을 검토한다.
텍스트, 비전, 오디오, 신경 데이터 간의 교차 모달 정렬에서 임베딩 공간의 근사 동형성(near-isomorphism)을 보여주는 증거를 요약한다.
왜 이러한 불변 구조가 나타날 수 있는지 해석하기 위해 기호 기초화와 메타시맨틱스에 대한 문헌을 포함한다.

실험 결과

연구 질문

RQ1모듈 간 독립적으로 학습된 임베딩 공간들(텍스트, 비전, 오디오, 신경 데이터) 사이에 공통된 불변의 시맨틱 구조가 존재하는가?
RQ2이 공간들을 맞추기에 충분한 간단한 선형 변환이 존재하여 그들의 시맨틱 기하가 근사적으로 동형인지 시사하는가?
RQ3불변 임베딩 기하학의 상징 기초화와 메타시맨틱스에 대한 철학적 및 실용적 함의는 무엇인가?
RQ4모달리티와 환경 전반에 걸친 보편적 불변성을 주장하는 데 존재하는 도전과제는 무엇인가?

주요 결과

내부 구조 비교 방법은 교차 모달 매핑 없이도 모달리티 간의 관계 패턴이 일치한다는 것을 보여준다.
변환 기반 방법은 간단한 선형 매핑이 서로 다른 임베딩 공간을 근사적으로 일치시켜 근사 동형성을 시사한다.
신경, 텍스트, 시각, 청각 모달리티의 증거 및 논의가 과제와 데이터 세트 전반에 걸쳐 제시된다.
플라톤식 표현 가설은 관찰된 정렬의 설명으로서 정당화되기 어렵거나 거부된다.
표상 정렬 가설은 플라톤적 리얼리즘이 아닌 메타시맨틱스와 기호 기초화의 맥락에서 제시된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.