QUICK REVIEW

[论文解读] ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models

Bowen Fang, Wen Ye|arXiv (Cornell University)|Jan 29, 2026

Machine Learning in Materials Science被引用 0

一句话总结

ToolWeaver 引入分层组合式工具代码，通过协作感知标记化学习，使 LLMs 的工具使用具备可扩展性和泛化性，在 ToolBench 上优于最先进方法。

ABSTRACT

Prevalent retrieval-based tool-use pipelines struggle with a dual semantic challenge: their retrievers often employ encoders that fail to capture complex semantics, while the Large Language Model (LLM) itself lacks intrinsic tool knowledge from its natural language pretraining. Generative methods offer a powerful alternative by unifying selection and execution, tasking the LLM to directly learn and generate tool identifiers. However, the common practice of mapping each tool to a unique new token introduces substantial limitations: it creates a scalability and generalization crisis, as the vocabulary size explodes and each tool is assigned a semantically isolated token. This approach also creates a semantic bottleneck that hinders the learning of collaborative tool relationships, as the model must infer them from sparse co-occurrences of monolithic tool IDs within a vast library. To address these limitations, we propose ToolWeaver, a novel generative tool learning framework that encodes tools into hierarchical sequences. This approach makes vocabulary expansion logarithmic to the number of tools. Crucially, it enables the model to learn collaborative patterns from the dense co-occurrence of shared codes, rather than the sparse co-occurrence of monolithic tool IDs. We generate these structured codes through a novel tokenization process designed to weave together a tool's intrinsic semantics with its extrinsic co-usage patterns. These structured codes are then integrated into the LLM through a generative alignment stage, where the model is fine-tuned to produce the hierarchical code sequences. Evaluation results with nearly 47,000 tools show that ToolWeaver significantly outperforms state-of-the-art methods, establishing a more scalable, generalizable, and semantically-aware foundation for advanced tool-augmented agents.

研究动机与目标

在工具目录激增的背景下，推动 LLMs 的可扩展工具使用。
提出一个组合的、分层的工具表示，替代逐工具一标记的方案。
通过结构化标记化过程学习工具语义和协作关系。
通过多阶段生成对齐将结构化工具代码整合到 LLMs 中。
在一个大型工具基准上展示检索与端到端性能的提升，同时保持语言能力。

提出的方法

将每个工具表示为来自 L 代码本的代码序列，实现对数级词汇增长（K^L 工具与 L*K 个新标记）。
使用协作感知的残差量化（RQ-VAE）将语义化工具描述映射到分层代码，并由工具-工具相似度矩阵引导。
引入图拉普拉斯正则化，基于共现让相似工具的代码更接近。
在最终代码本层级应用统一映射约束，通过最优传输与 Sinkhorn-Knopp 求解避免冲突。
在两个阶段对 LLM 进行微调：检索对齐（从查询预测工具代码序列）和轨迹对齐（学习工具调用、参数和答案）。
推理阶段在有效代码序列前缀树的约束下使用受限式束搜索，确保工具标识符有效性。

实验结果

研究问题

RQ1如何在不使词汇表爆炸的前提下实现超越逐工具一标记的工具表示扩展？
RQ2是否能将工具间的协作使用模式整合入工具表示，以提升泛化和推理能力？
RQ3与以往方法相比，结构化、分层的代码表示是否改善端到端任务性能和工具编排？
RQ4协作正则化对模型性能和语言能力有何影响？

主要发现

Model	I1 NDCG@1	I1 NDCG@3	I1 NDCG@5	I2 NDCG@1	I2 NDCG@3	I2 NDCG@5	I3 NDCG@1	I3 NDCG@3	I3 NDCG@5
BM25*	22.77	22.64	25.61	18.29	20.74	22.18	10.00	10.08	12.33
EmbSim*	54.00	50.82	55.86	40.84	36.67	39.55	18.00	17.77	20.70
ToolRetriever*	72.31	70.30	74.99	64.54	57.91	63.61	52.00	39.89	42.92
ToolGen*	87.67	88.84	91.54	83.46	86.24	88.84	79.00	79.80	84.79
ToolWeaver	91.16	91.14	93.48	89.76	89.70	91.80	88.00	85.80	90.12

ToolWeaver 在简单与复杂场景下实现了更高的检索 NDCG@k，其中在最困难的 I3 设置中，NDCG@1 达到 88.00。
在端到端评估中，ToolWeaver 在多种设置下达到最高的 SoPR/SoWR，包括未见工具与类别的情况。
消融实验显示语义初始化是最关键的步骤，协作引导在特别是复杂任务中带来额外提升。
协作正则化的最优权重在 λ≈1 附近；λ 太大会损害工具本身语义。
ToolWeaver 相较 ToolGen 更好地保持一般语言能力，困惑度更低，摘要质量稳定。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。