QUICK REVIEW

[论文解读] A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds?

Evelina Leivada, Gary Marcus|arXiv (Cornell University)|Jul 26, 2023

Topic Modeling被引用 11

一句话总结

本文评估大型语言模型（LLMs）是否是理论上具有信息性的表征，还是仅仅是机械工具，并通过一种新颖的leet任务测试它们用世界经验将语言“地基化”的能力，结果显示人类优于模型。

ABSTRACT

Modern Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of Large Language Models (LLMs) have been linked to claims about human-like linguistic performance and their applications are hailed both as a step towards artificial general intelligence and as a major advance in understanding the cognitive, and even neural basis of human language. To assess these claims, first we analyze the contribution of LLMs as theoretically informative representations of a target cognitive system vs. atheoretical mechanistic tools. Second, we evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing, which requires grounding in previous expectations and past world experience. We hypothesize that since models lack grounded cognition, they cannot take advantage of these features and instead solely rely on fixed associations between represented words and word vectors. To assess this, we designed and ran a novel 'leet task' (l33t t4sk), which requires decoding sentences in which letters are systematically replaced by numbers. The results suggest that humans excel in this task whereas models struggle, confirming our hypothesis. We interpret the results by identifying the key abilities that are still missing from the current state of development of these models, which require solutions that go beyond increased system scaling.

研究动机与目标

评估 LLMs 是否提供对人类认知的理论上具信息性的表征，还是仅仅作为机械工具。
研究先前经验中自上而下处理和地基化在语言理解中的作用。
测试当前模型是否能够利用具身化认知，还是仅依赖固定的词向量关联。

提出的方法

将 LLMs 与人类认知进行比较，以判断模型是否能够获取超越词共现的更高层次地基化和语境。
设计并执行一种新颖的 leet 任务，通过系统的字母到数字替换来解码句子，以探测理解。
分析结果以识别当前模型在扩大参数规模以外缺失的认知能力。

实验结果

研究问题

RQ1大型语言模型是否能够作为对人类语言和认知的理论上具信息性的表征，还是主要是非理论性的工具？
RQ2LLMs 是否表现出地基化和利用世界经验进行语言理解的自顶向下处理？
RQ3在需要在破坏正字法的任务中，当前的 LLMs 是否能够超过人类，还是人类保持优势？
RQ4哪些关键能力缺失在 LLMs 中，阻碍对语言及世界知识的地基化理解？

主要发现

人类在 leet 任务上表现出色，而模型表现不佳，表明模型缺乏具身化/地基化认知。
结果表明，固定的词向量关联不足以完成需要地基化和自顶向下处理的任务。
研究发现指出 LLMs 缺失的能力不能仅通过简单地扩大模型规模来解决。
研究将这些差距解读为需要超越规模化的发展，以实现具身化语言理解。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。