QUICK REVIEW

[论文解读] Understanding Transformer Reasoning Capabilities via Graph Algorithms

Clayton Sanford, Bahare Fatemi|arXiv (Cornell University)|May 28, 2024

AI-based Problem Solving and Planning被引用 5

一句话总结

本文提出一个理论与经验框架，展示变压器深度、宽度和填充如何影响求解图推理任务；对并行化任务，深度为对数的变压器表现最佳；单层变压器处理检索任务；GraphQA 实验支持。

ABSTRACT

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.

研究动机与目标

为变压器引入图推理任务的表示层级。
表征哪些缩放规模（深度、宽度、填充）能够解决不同的图任务。
在全局与局部图推理任务上，将变压器与 GNNs 进行比较。
在 GraphQA 基准上经验性验证理论预测。
评估训练好的变压器与基于提示的 LLM 推理在图任务上的差距。

提出的方法

提出一种图标记化方案，将图编码以供变压器输入（顶点、边、任务标记）。
定义缩放规模：Depth1、LogDepth、LogDepthPause、LogDepthWide，给出具体的 m、L 和 N' 约束。
建立变压器与 MPC 模型之间的理论联系，以推导任务类别的深度/宽度要求。
证明检索任务可以用 Depth1 变压器解决，而并行化和搜索任务需要更深或更宽的配置。
在 GraphQA 上进行经验实验，比较变压器、GNNs 和提示方法在连通性、最短路径及相关任务上的表现。

实验结果

研究问题

RQ1在现实深度/宽度/填充约束下，变压器能够解决哪些图推理任务？
RQ2检索、并行化和搜索任务在对深度和嵌入维度的依赖上有何差异？
RQ3变压器是否在全局图推理任务上优于 GNNs，在何种数据量/规模条件下？
RQ4基于提示的 LLM 推理方法是否能够在图任务上与专用的变压器相匹配？
RQ5对数深度变压器在图算法上的理论极限是什么？

主要发现

Depth O(log N) 的 LogDepth 变压器以及适度的宽度可以解决并行化任务，如连通性。
单层（Depth1）变压器可以高效解决检索任务，如节点/边计数和边是否存在。
对于搜索任务（如最短路径），LogDepthWide 变压器可以解决，通常需要更大的嵌入维度。
经验上，变压器在全局推理任务上优于 GNNs，而 GNNs 在局部、样本较少的任务中表现出色。
微调的大型变压器在图推理基准上优于基于提示的 LLM。
在某些 arboricity 条件下，三角计数可用深度低至 O(log log N) 来实现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。