[论文解读] Small Language Models are the Future of Agentic AI
该论文认为小语言模型(SLMs)已经足够强大、成本更低、并且更适合大多数具代理能力的人工智能任务,在带有大语言模型的代理系统中应成为默认选项,LLMs只在特定场景下使用。它还概述了一种将LLM转换为SLM的算法,并讨论代理架构中的障碍与异质性。
Large language models (LLMs) are often praised for exhibiting near-human performance on a wide range of tasks and valued for their ability to hold a general conversation. The rise of agentic AI systems is, however, ushering in a mass of applications in which language models perform a small number of specialized tasks repetitively and with little variation. Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm. Our position, formulated as a value statement, highlights the significance of the operational and economic impact even a partial shift from LLMs to SLMs is to have on the AI agent industry. We aim to stimulate the discussion on the effective use of AI resources and hope to advance the efforts to lower the costs of AI of the present day. Calling for both contributions to and critique of our position, we commit to publishing all such correspondence at https://research.nvidia.com/labs/lpr/slm-agents.
研究动机与目标
- 论证SLMs在许多具代理能力的任务中已经足够强大,并提供更好的运营适用性。
- 倡导在适当情况下将SLMs与LLMs结合的模块化、异质的具代理系统。
- 强调在具代理工作流中部署SLMs的经济与环境效益。
提出的方法
- 调查近来SLM的能力,并在具代理情境中与LLMs进行比较,给出具体SLM家族的示例(Phi、Nemotron-H、SmolLM2、Hymba、DeepSeek、RETRO、xLAM)。
- 论证SLMs更低的潜在延迟、内存和计算需求,并描述这如何使成本效益高、模块化的具代理体系结构成为可能。
- 描述工具调用、提示和推理时增强如何提升SLM的性能(例如自一致性、验证者反馈)。
- 提出一个实用的LLM到SLM的代理转换算法,包含数据收集、整理、任务聚类、SLM选择、专门微调和迭代等步骤。
实验结果
研究问题
- RQ1SLMs在核心具代理任务(如常识推理、工具调用和指令执行)方面在多大程度上能够达到或超过LLMs?
- RQ2在具代理系统中,SLMs在潜在延迟、能耗和总成本方面与LLMs相比如何?
- RQ3默认使用SLMs、并在需要时有选择地使用LLMs的异质、模块化具代理架构是否能提升效率与灵活性?
- RQ4将现有的基于LLM的代理转换为基于SLM的代理的实际流程是什么?
主要发现
- SLMs在诸多具代理任务(如常识推理和工具使用)上能够达到与较大模型相当的性能。
- SLMs在类似任务上比大型LLMs的推理成本低10–30×,延迟和内存需求也更低。
- SLMs使模块化、异质的代理设计成为可能,并可对专门技能进行快速微调。
- 在互动中收集的代理数据可以被重新用于训练任务专门化的SLMs,从而实现持续改进。
- 提出了一种显式的LLM到SLM转换算法,概述了数据记录、整理、任务聚类和微调步骤。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。