QUICK REVIEW

[论文解读] Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection

Sarat Mudunuri, Jian Wan|arXiv (Cornell University)|Mar 19, 2026

Topic Modeling被引用 0

一句话总结

论文提出一个语义、向量化的 MCP 工具发现层，按查询动态选取三个顶级工具，以在最小化 token 使用的同时保持高工具调用准确性。

ABSTRACT

Large Language Models (LLMs) with tool-calling capabilities have demonstrated remarkable potential in executing complex tasks through external tool integration. The Model Context Protocol (MCP) has emerged as a standardized framework for connecting LLMs to diverse toolsets, with individual MCP servers potentially exposing dozens to hundreds of tools. However, current implementations face a critical scalability challenge: providing all available tools to the LLM context results in substantial token overhead, increased costs, reduced accuracy, and context window constraints. We present a semantic tool discovery architecture that addresses these challenges through vector-based retrieval. Our approach indexes MCP tools using dense embeddings that capture semantic relationships between tool capabilities and user intent, dynamically selecting only the most relevant tools (typically 3-5) rather than exposing the entire tool catalog (50-100+). Experimental results demonstrate a 99.6% reduction in tool-related token consumption with a hit rate of 97.1% at K=3 and an MRR of 0.91 on a benchmark of 140 queries across 121 tools from 5 MCP servers, with sub-100ms retrieval latency. Contributions include: (1) a semantic indexing framework for MCP tools, (2) a dynamic tool selection algorithm based on query-tool similarity, (3) comprehensive evaluation demonstrating significant efficiency and accuracy improvements, and (4) extensibility to multi-agent and cross-organizational tool discovery.

研究动机与目标

Motivate the scalability challenge of MCP tool provisioning in LLMs due to token overhead and context-window limits.
Propose a semantic indexing architecture that uses dense embeddings to map tool capabilities to user intent.
Evaluate token efficiency, tool-selection accuracy, latency, and cost across multiple MCP servers.
Demonstrate open-source implementation and discuss extensions to multi-agent systems and cross-organizational tool discovery.

提出的方法

Index MCP tools by extracting schemas and constructing semantic tool documents.
Generate query and tool embeddings with text-embedding-ada-002 and store in Milvus vector store.
Retrieve top-K tools by cosine/dot-product similarity and optionally apply thresholding and re-ranking.
Inject selected tools into the LLM context for tool calls and aggregate results for response generation.
Provide a feedback loop for refining embeddings and retrieval parameters.

实验结果

研究问题

RQ1Can semantic similarity between user queries and tool descriptions enable effective dynamic tool selection in MCP systems?
RQ2What is the quantitative impact of semantic tool filtering on token efficiency, cost, and system performance?
RQ3How does semantic tool selection affect LLM accuracy in tool calling compared to providing all available tools?
RQ4What are the optimal parameters (number of tools retrieved, similarity threshold, embedding model) for balancing recall and precision?

主要发现

K	Precision@K	Recall@K	F1@K	Hit Rate@K	MRR	Token Reduction	Latency (ms)
1	92.1%	31.5%	46.9%	85.0%	0.8500	99.6%	87.1
2	70.0%	48.3%	57.0%	95.7%	0.9036	99.6%	90.2
3	57.6%	59.6%	58.4%	97.1%	0.9083	99.6%	87.8
5	42.1%	72.5%	53.2%	97.1%	0.9083	99.6%	87.0
10	26.5%	90.6%	40.9%	98.6%	0.9107	99.6%	88.1

Semantic similarity enables effective dynamic tool selection with a hit rate of 97.1% at K=3.
Token reduction is 99.6% across all K values and servers.
MRR remains around 0.91 for K≥3, indicating early correct tool surface within top results.
Optimal operating point is K=3, balancing precision and recall (F1=58.4%).
Retrieval latency stays below 91 ms across configurations.
Per-server performance varies with catalog distinctiveness, e.g., MySQL and GitHub showing higher precision at low K.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。