QUICK REVIEW

[論文レビュー] Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Gavin Mischler, Yinghao Aaron Li|arXiv (Cornell University)|Jan 31, 2024

Topic Modeling被引用数 5

ひとこと要約

この研究は12個のオープンソースの約7BパラメータLLMを分析し、ベンチマーク性能が高いほど脳のような予測力が大きく、脳に整合した階層的特徴抽出が進むことを示しており、特に長い文脈情報が利用可能な場合に顕著である。

ABSTRACT

Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for developing models that align more closely with human cognitive processing.

研究の動機と目的

言語処理中に脳とLLMの整合を駆動する計算原理を明らかにする。
LLMの性能が層ごとおよび脳領域全体でのLLM埋め込みからの神経予測可能性とどのように関連するかを評価する。
LLMsと脳との階層的処理の整合性を検討し、文脈情報がこの整合性に与える影響を評価する。
異なるLLM間で特徴抽出経路を比較し、脳のような処理への収束を特定する。

提案手法

話を聞いている間に8人の患者から頭蓋内脳波（intracranial EEG）を記録する。
同じ音声を12のオープンソースLLM（約7Bパラメータ）に入力し、全32層から埋め込みを抽出する。
埋め込みを500成分に減らすためにPCAを使用し、層の埋め込みから神経応答を予測するためにリッジ回帰を用いる。
脳の類似性を、電極全体で神経応答とLLM埋め込みの予測相関として計算する。
pmHGからの距離を階層的脳指標として、層レベルの脳整合性を評価する。
文脈情報の効果を評価するため、注目を変化する文脈窓（1トークン〜100トークン）に制限し、CKA差を介して文脈内容を測定する。

実験結果

リサーチクエスチョン

RQ1LLMベンチマーク性能は、LLM埋め込みからの神経応答の脳予測可能性とどのように関連するか？
RQ2より良いモデルは脳の類似性のピークを早い層で迎えるのか、そしてこれは解剖学的な脳領域とどう関連するのか？
RQ3LLMにおける層別の特徴抽出の進行は、脳の言語処理階層とどのように整合するか？
RQ4文脈情報が脳の整合性とモデル性能を高める役割は何か？
RQ5全体のモデル性能と相関する階層的処理経路にモデル間の違いはあるか？

主な発見

高性能なLLMは神経応答の予測において脳類似性のピークが高くなる（Pearson r = 0.92, p = 2.24×10^-5）。
より良いモデルは脳類似性のピークを早い層で迎える（Pearson r = -0.81, p = 0.0013）。
LLMsと脳の間の脳階層整合性はLLMの性能と相関する（r = 0.79, p = 0.0021）。
トップ5モデルは高い対角線様のCKA類似性を示し、類似の層ごとの特徴抽出に収束する。一方、劣るモデルは特徴抽出がより遅延する。
文脈情報は重要で、長い文脈窓（50トークン以上）は脳の整合性を有意に改善し、ベンチマーク性能と相関する（Spearman r = 0.66, p = 0.020; 脳類似性では r = 0.84, p = 0.0006）。
文脈内容（全文脈 vs 1トークン）は、モデル性能と脳類似性の両方を強く予測する（それぞれ r = 0.66 および r = 0.84）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。