QUICK REVIEW

[論文レビュー] Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection

Sarat Mudunuri, Jian Wan|arXiv (Cornell University)|Mar 19, 2026

Topic Modeling被引用数 0

ひとこと要約

The paper presents a semantic, vector-based tool-discovery layer for MCP that dynamically selects 3 top tools per query to minimize token usage while maintaining high tool-calling accuracy.

ABSTRACT

Large Language Models (LLMs) with tool-calling capabilities have demonstrated remarkable potential in executing complex tasks through external tool integration. The Model Context Protocol (MCP) has emerged as a standardized framework for connecting LLMs to diverse toolsets, with individual MCP servers potentially exposing dozens to hundreds of tools. However, current implementations face a critical scalability challenge: providing all available tools to the LLM context results in substantial token overhead, increased costs, reduced accuracy, and context window constraints. We present a semantic tool discovery architecture that addresses these challenges through vector-based retrieval. Our approach indexes MCP tools using dense embeddings that capture semantic relationships between tool capabilities and user intent, dynamically selecting only the most relevant tools (typically 3-5) rather than exposing the entire tool catalog (50-100+). Experimental results demonstrate a 99.6% reduction in tool-related token consumption with a hit rate of 97.1% at K=3 and an MRR of 0.91 on a benchmark of 140 queries across 121 tools from 5 MCP servers, with sub-100ms retrieval latency. Contributions include: (1) a semantic indexing framework for MCP tools, (2) a dynamic tool selection algorithm based on query-tool similarity, (3) comprehensive evaluation demonstrating significant efficiency and accuracy improvements, and (4) extensibility to multi-agent and cross-organizational tool discovery.

研究の動機と目的

Motivate the scalability challenge of MCP tool provisioning in LLMs due to token overhead and context-window limits.
Propose a semantic indexing architecture that uses dense embeddings to map tool capabilities to user intent.
Evaluate token efficiency, tool-selection accuracy, latency, and cost across multiple MCP servers.
Demonstrate open-source implementation and discuss extensions to multi-agent systems and cross-organizational tool discovery.

提案手法

Index MCP tools by extracting schemas and constructing semantic tool documents.
Generate query and tool embeddings with text-embedding-ada-002 and store in Milvus vector store.
Retrieve top-K tools by cosine/dot-product similarity and optionally apply thresholding and re-ranking.
Inject selected tools into the LLM context for tool calls and aggregate results for response generation.
Provide a feedback loop for refining embeddings and retrieval parameters.

実験結果

リサーチクエスチョン

RQ1Can semantic similarity between user queries and tool descriptions enable effective dynamic tool selection in MCP systems?
RQ2What is the quantitative impact of semantic tool filtering on token efficiency, cost, and system performance?
RQ3How does semantic tool selection affect LLM accuracy in tool calling compared to providing all available tools?
RQ4What are the optimal parameters (number of tools retrieved, similarity threshold, embedding model) for balancing recall and precision?

主な発見

K	Precision@K	Recall@K	F1@K	Hit Rate@K	MRR	Token Reduction	Latency (ms)
1	92.1%	31.5%	46.9%	85.0%	0.8500	99.6%	87.1
2	70.0%	48.3%	57.0%	95.7%	0.9036	99.6%	90.2
3	57.6%	59.6%	58.4%	97.1%	0.9083	99.6%	87.8
5	42.1%	72.5%	53.2%	97.1%	0.9083	99.6%	87.0
10	26.5%	90.6%	40.9%	98.6%	0.9107	99.6%	88.1

Semantic similarity enables effective dynamic tool selection with a hit rate of 97.1% at K=3.
Token reduction is 99.6% across all K values and servers.
MRR remains around 0.91 for K≥3, indicating early correct tool surface within top results.
Optimal operating point is K=3, balancing precision and recall (F1=58.4%).
Retrieval latency stays below 91 ms across configurations.
Per-server performance varies with catalog distinctiveness, e.g., MySQL and GitHub showing higher precision at low K.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。