QUICK REVIEW

[论文解读] Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Yun-Peng Huang, Jingwei Xu|arXiv (Cornell University)|Nov 21, 2023

Topic Modeling被引用 15

一句话总结

本综述在预训练、微调和推理阶段，回顾了用于长上下文的 Transformer 架构进展，将方法整理为五类分类法，并讨论评估与工具。

ABSTRACT

Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at https://github.com/Strivin0311/long-llms-learning.

研究动机与目标

在 Transformer 基于 LLM 的长上下文处理从预训练到推理的挑战。
提供扩展上下文窗口的架构进步的整体分类法。
调查评估需求、数据集、指标、基线，以及用于长上下文 LLM 的优化工具包。
讨论长上下文 Transformer 研究中的挑战与未来方向。
提供一个实时文献库以跟踪持续进展。

提出的方法

定义语言模型目标与建模阶段的问题空间与初步概念。
提出覆盖五类的长上下文 LLM 的架构改进分类法。
审阅每一类中的具体方法及其对上下文长度与效率的影响。
总结评估需求并描述用于训练与推理优化的常见工具包。
强调挑战与未来研究方向，并通过 curated 的文献库进行支持。

实验结果

研究问题

RQ1在训练、微调、推理与处理过程中，已经提出了哪些架构方法来扩展 Transformer 基于 LLM 的有效上下文长度？
RQ2高效注意力、记忆机制、外推位置嵌入与上下文处理如何共同提升长上下文性能？
RQ3常用的评估数据集、指标与基线是什么，用于评估 LLM 的长上下文能力？
RQ4设计长上下文 Transformer 架构的关键挑战与潜在未来方向是什么？

主要发现

高效注意力、记忆、外推位置嵌入、上下文处理及其他方法共同解决了长上下文的局限性。
提出一个将方法映射至 Transformer-based LLM 开发阶段的整体分类法。
综述覆盖评估需求并识别用于改进训练与推理效率的流行工具包与库。
建立了一个文献库以整理并更新关于长上下文 LLM 进展的资料。
论文讨论了推动长上下文能力前进的挑战和未来研究方向。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。