[논문 리뷰] Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection
논문은 토큰 사용량을 최소화하면서도 높은 도구 호출 정확도를 유지하기 위해 쿼리당 상위 3개 도구를 동적으로 선택하는 MCP용 의미론적 벡터 기반 도구 탐색 계층을 제시합니다.
Large Language Models (LLMs) with tool-calling capabilities have demonstrated remarkable potential in executing complex tasks through external tool integration. The Model Context Protocol (MCP) has emerged as a standardized framework for connecting LLMs to diverse toolsets, with individual MCP servers potentially exposing dozens to hundreds of tools. However, current implementations face a critical scalability challenge: providing all available tools to the LLM context results in substantial token overhead, increased costs, reduced accuracy, and context window constraints. We present a semantic tool discovery architecture that addresses these challenges through vector-based retrieval. Our approach indexes MCP tools using dense embeddings that capture semantic relationships between tool capabilities and user intent, dynamically selecting only the most relevant tools (typically 3-5) rather than exposing the entire tool catalog (50-100+). Experimental results demonstrate a 99.6% reduction in tool-related token consumption with a hit rate of 97.1% at K=3 and an MRR of 0.91 on a benchmark of 140 queries across 121 tools from 5 MCP servers, with sub-100ms retrieval latency. Contributions include: (1) a semantic indexing framework for MCP tools, (2) a dynamic tool selection algorithm based on query-tool similarity, (3) comprehensive evaluation demonstrating significant efficiency and accuracy improvements, and (4) extensibility to multi-agent and cross-organizational tool discovery.
연구 동기 및 목표
- Motivate the scalability challenge of MCP tool provisioning in LLMs due to token overhead and context-window limits.
- Propose a semantic indexing architecture that uses dense embeddings to map tool capabilities to user intent.
- Evaluate token efficiency, tool-selection accuracy, latency, and cost across multiple MCP servers.
- Demonstrate open-source implementation and discuss extensions to multi-agent systems and cross-organizational tool discovery.
제안 방법
- Index MCP tools by extracting schemas and constructing semantic tool documents.
- Generate query and tool embeddings with text-embedding-ada-002 and store in Milvus vector store.
- Retrieve top-K tools by cosine/dot-product similarity and optionally apply thresholding and re-ranking.
- Inject selected tools into the LLM context for tool calls and aggregate results for response generation.
- Provide a feedback loop for refining embeddings and retrieval parameters.
실험 결과
연구 질문
- RQ1Can semantic similarity between user queries and tool descriptions enable effective dynamic tool selection in MCP systems?
- RQ2What is the quantitative impact of semantic tool filtering on token efficiency, cost, and system performance?
- RQ3How does semantic tool selection affect LLM accuracy in tool calling compared to providing all available tools?
- RQ4What are the optimal parameters (number of tools retrieved, similarity threshold, embedding model) for balancing recall and precision?
주요 결과
| K | Precision@K | Recall@K | F1@K | Hit Rate@K | MRR | Token Reduction | Latency (ms) |
|---|---|---|---|---|---|---|---|
| 1 | 92.1% | 31.5% | 46.9% | 85.0% | 0.8500 | 99.6% | 87.1 |
| 2 | 70.0% | 48.3% | 57.0% | 95.7% | 0.9036 | 99.6% | 90.2 |
| 3 | 57.6% | 59.6% | 58.4% | 97.1% | 0.9083 | 99.6% | 87.8 |
| 5 | 42.1% | 72.5% | 53.2% | 97.1% | 0.9083 | 99.6% | 87.0 |
| 10 | 26.5% | 90.6% | 40.9% | 98.6% | 0.9107 | 99.6% | 88.1 |
- Semantic similarity enables effective dynamic tool selection with a hit rate of 97.1% at K=3.
- Token reduction is 99.6% across all K values and servers.
- MRR remains around 0.91 for K≥3, indicating early correct tool surface within top results.
- Optimal operating point is K=3, balancing precision and recall (F1=58.4%).
- Retrieval latency stays below 91 ms across configurations.
- Per-server performance varies with catalog distinctiveness, e.g., MySQL and GitHub showing higher precision at low K.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.