QUICK REVIEW

[論文レビュー] Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

Abeba Birhane, Marek McGann|arXiv (Cornell University)|Jul 11, 2024

Design Education and Practice被引用数 9

ひとこと要約

本論は、LLMsが人間のような言語的エージェントではないと主張する。言語とデータ完全性という二つの誤った仮定に基づくもので、エネクティブな視点から embodiment、participation、precarity を強調し、ギャップを algospeak を用いて示している。

ABSTRACT

In this paper we argue that key, often sensational and misleading, claims regarding linguistic capabilities of Large Language Models (LLMs) are based on at least two unfounded assumptions; the assumption of language completeness and the assumption of data completeness. Language completeness assumes that a distinct and complete thing such as `a natural language' exists, the essential characteristics of which can be effectively and comprehensively modelled by an LLM. The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data. Work within the enactive approach to cognitive science makes clear that, rather than a distinct and complete thing, language is a means or way of acting. Languaging is not the kind of thing that can admit of a complete or comprehensive modelling. From an enactive perspective we identify three key characteristics of enacted language; embodiment, participation, and precariousness, that are absent in LLMs, and likely incompatible in principle with current architectures. We argue that these absences imply that LLMs are not now and cannot in their present form be linguistic agents the way humans are. We illustrate the point in particular through the phenomenon of `algospeak', a recently described pattern of high stakes human language activity in heavily controlled online environments. On the basis of these points, we conclude that sensational and misleading claims about LLM agency and capabilities emerge from a deep misconception of both what human language is and what LLMs are.

研究の動機と目的

LLMs が人間の言語を完全に理解・支配しているという主張に挑戦する。
エンゲニアリング的な言語観とエネクティブ認知科学の概念を対比させる。
人間の言語使用の embodiment・participation・precariy の本質を強調する。
algospeak のような現実世界のパターンを用いて LLM の限界を示す。

提案手法

LLM の工学的設計とエネクティブな言語理論の理論的比較を行う。
language completeness と data completeness の概念を定義・適用する。
身体化・参加・不安定さといった言語的エージェンシーの側面を明らかにする。
algospeak の概念を用いて、言語的エージェンシーが実際に機能する様子を示す。

実験結果

リサーチクエスチョン

RQ1LLMs はエネクティブな意味で人間の言語的エージェンシーを具現化しているか？
RQ2身体化・参加・ precarity などの重要な側面は、LLMs を人間の言語活動とどう区別するか？
RQ3言語とデータに対する仮定は、LLM の能力に関する主張にどう影響するか？
RQ4アルゴスピークはオンライン相互作用における言語的エージェンシーについて何を示すか？

主な発見

LLMs はトークンベースの統計モデルとして機能し、言語の完全な理解を行うわけではない。
LLMs に関する扇動的主張の背後には、二つの核心的仮定（言語完全性とデータ完全性）がある。
エネクティブな言語は embodiment・participation・precariousness を強調するが、LLMs にはこれらが欠けている。
人間の言語エージェンシーには、テキストデータを超えた具現化された、相互作用的で文脈的に grounded な参加が含まれる。
Algospeak は高リスクで管理された言語使用の実例であり、LLMs の主張される言語エージェンシーに挑戦をもたらす。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。