QUICK REVIEW

[論文レビュー] Machine Learning as a Tool (MLAT): A Framework for Integrating Statistical ML Models as Callable Tools within LLM Agent Workflows

Edwin Chen, Zulekha Bibi|arXiv (Cornell University)|Feb 15, 2026

Artificial Intelligence in Healthcare and Education被引用数 0

ひとこと要約

MLATは、事前学習済みの統計的MLモデルをLLMエージェントワークフロー内の呼び出し可能なツールとして公開し、文脈依存の根拠付き予測を構造化出力に統合する。小規模データ regime での双子エージェント Gemini 設定と PitchCraft の pricing によって実演された。

ABSTRACT

We introduce Machine Learning as a Tool (MLAT), a design pattern in which pre-trained statistical machine learning models are exposed as callable tools within large language model (LLM) agent workflows. This allows an orchestrating agent to invoke quantitative predictions when needed and reason about their outputs in context. Unlike conventional pipelines that treat ML inference as a static preprocessing step, MLAT positions the model as a first-class tool alongside web search, database queries, and APIs, enabling the LLM to decide when and how to use it based on conversational context. To validate MLAT, we present PitchCraft, a pilot production system that converts discovery call recordings into professional proposals with ML-predicted pricing. The system uses two agents: a Research Agent that gathers prospect intelligence via parallel tool calls, and a Draft Agent that invokes an XGBoost pricing model as a tool call and generates a complete proposal through structured outputs. The pricing model, trained on 70 examples combining real and human-verified synthetic data, achieves R^2 = 0.807 on held-out data with a mean absolute error of 3688 USD. The system reduces proposal generation time from multiple hours to under 10 minutes. We describe the MLAT framework, structured output architecture, training methodology under extreme data scarcity, and sensitivity analysis demonstrating meaningful learned relationships. MLAT generalizes to domains requiring quantitative estimation combined with contextual reasoning.

研究の動機と目的

事前学習済みのMLモデルをLLMエージェントレジストリ内のツールとして公開するMLAT設計パターンを正式化する。
ML-predicted pricingで提案を生成する生産環境に近いシステム（PitchCraft）でのMLATのエンドツーエンド実装をデモンストレーションする。
JSONスキーマを用いた構造化出力のパースがLLM推論とML特徴ベクトルの橋渡しをすることを示す。
実データ＋合成データの極端な不足下で、グループ識別を考慮した検証を用いたMLATを評価する。

提案手法

LLMが構造化コンテキストから特徴ベクトルを抽出し、訓練済みMLモデルを呼び出して予測を得る、エージェント統制のツール呼び出しパターンとしてMLATを定義する。
MLモデルをステートレスRESTエンドポイントツールとして登録し、スキーマ制約付きの抽出と出力スキーマを用いてLLM推論とML入力を橋渡しする。
GeminiのJSONスキーマ制約を用いて信頼性の高い構造化出力パースと研究者エージェントとドラフトエージェント間の相互契約を実現する。
小規模データセット（N=70、実データ40件＋合成30件）でXGBoost回帰モデルを訓練し、グループ識別付きクロスバリデーションと特徴量エンジニアリング（8特徴、tech_stackのワンホットエンコーディング）を適用する。
感度分析とクロスバリデーション主導の性能評価を実施し、学習された経済関係を検証する。

Figure 2 : Target variable distribution across training and test sets. Left: Histogram showing the right-skewed price distribution. Training (blue) and test (orange) sets show comparable coverage. Right: Box plots confirming similar median and interquartile ranges ( $n_{\text{train}}=56$ , $n_{\text

実験結果

リサーチクエスチョン

RQ1訓練済みMLモデルをLLMエージェントワークフロー内のツールとして公開することは、文脈に応じた意思決定と予測の解釈性を改善するか。
RQ2合成拡張を含む低データ regime でのMLATパターンは、予測精度と汎化性の観点でどう機能するか。
RQ3構造化出力パースは研究者エージェントとドラフトエージェント間の信頼性の高い特徴抽出と相互通信を実現するか。
RQ4実務的タスク（提案生成と価格設定）におけるMLATの影響は、生産環境に近い設定でどう現れるか。

主な発見

Metric	Training Set	Test Set	Cross-Validation
R^2	0.937	0.807	0.816±0.060
MAE	2,328	3,688	3,898±629
RMSE	2,874	4,720	—
Relative MAE	14.3%	22.6%	23.9%

XGBoost価格モデルは、70サンプル regime において保持データでR^2 = 0.807、MAE = 3,688、RMSE = 4,720を達成。
クロスバリデーションR^2は0.816（±0.060）で、データが少なく合成拡張がある状況でも信頼できる汎化を示す。
PitchCraft全体のパイプラインは提案作成時間を3時間以上から10分未満に短縮し、リード獲得までの速度を12〜18倍速くする。
感度分析は、痛みの程度と統合の複雑さが高まるほど価格が経済的に一貫して上昇することを示し、記憶ではなく意味のある関係性を学習していることを示唆する。
Ridge回帰と比較して、XGBoostはCV R^2（0.816±0.060 vs. 0.565±0.180）が substantiallyよく、非線形特徴相互作用の重要性を裏付ける。

Figure 3 : Predicted vs. actual price. Left: Training set ( $R^{2}=0.937$ ) shows tight clustering around the identity line. Right: Test set ( $R^{2}=0.807$ ) demonstrates generalization to unseen data, with slight overestimation in the mid-range and underestimation at the highest values—consistent

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。