QUICK REVIEW

[Paper Review] Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks

Kunihiro Miyazaki, Takanobu Kawahara|arXiv (Cornell University)|Feb 26, 2026

Stock Market Forecasting Methods0 citations

TL;DR

The paper introduces a multi-agent LLM trading framework with fine-grained, task-specific prompts that mimic real investment workflows, and shows improved risk-adjusted returns over coarse-grained designs in backtests on Japanese TOPIX 100 data.

ABSTRACT

The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches deploy multi-agent systems mimicking analyst and manager roles, they often rely on abstract instructions that overlook the intricacies of real-world workflows, which can lead to degraded inference performance and less transparent decision-making. Therefore, we propose a multi-agent LLM trading framework that explicitly decomposes investment analysis into fine-grained tasks, rather than providing coarse-grained instructions. We evaluate the proposed framework using Japanese stock data, including prices, financial statements, news, and macro information, under a leakage-controlled backtesting setting. Experimental results show that fine-grained task decomposition significantly improves risk-adjusted returns compared to conventional coarse-grained designs. Crucially, further analysis of intermediate agent outputs suggests that alignment between analytical outputs and downstream decision preferences is a critical driver of system performance. Moreover, we conduct standard portfolio optimization, exploiting low correlation with the stock index and the variance of each system's output. This approach achieves superior performance. These findings contribute to the design of agent structure and task configuration when applying LLM agents to trading systems in practical settings.

Motivation & Objective

Motivate and assess whether fine-grained task decomposition improves LLM-based trading performance compared to coarse-grained prompts.
Mimic real-world investment workflows by assigning detailed tasks to specialized agents.
Evaluate both portfolio performance and interpretability of intermediate agent outputs in backtesting.
Analyze information flow and the necessity of task granularity for robust trading signals.

Proposed method

Proposes a bottom-up manager–analyst multi-agent framework with seven specialized agents (Technical, Quantitative, Qualitative, News, Sector, Macro, PM).
Implements fine-grained prompts for the Technical and Quantitative agents using pre-calculated indicators and metrics; compares against coarse-grained prompts that feed raw data.
Uses a Japanese TOPIX 100 long-short, market-neutral portfolio with monthly rebalancing from Sep 2023 to Nov 2025, and GPT-4o for inference.
Controls data leakage by ensuring a knowledge cutoff gap and using only data up to decision points.
Evaluates performance with Sharpe ratio and analyzes qualitative outputs and information propagation across agents.
Includes ablation studies to assess the contribution of each agent to overall performance.

Experimental results

Research questions

RQ1Does fine-grained task decomposition improve risk-adjusted returns (Sharpe ratio) versus coarse-grained task settings in a multi-agent LLM trading system?
RQ2How does removing individual agents affect performance, and which agents drive the gains from fine-grained prompts?
RQ3What is the role of information propagation and interpretability in the performance of hierarchical LLM agent trading frameworks?
RQ4Can the framework be validated with realistic data sources (prices, financial statements, news, macro data) and backtesting without leakage?

Key findings

Fine-grained task decomposition generally outperforms coarse-grained designs in Sharpe ratio across portfolio sizes of 20, 30, 40, and 50 in backtests.
Ablation shows the Technical Agent is particularly important for the performance gains under fine-grained prompts.
News and Macro agents contribute to performance, with nuanced differences across settings; removal can either hurt or sometimes improve results depending on the configuration.
Semantic analysis indicates higher alignment between lower-level outputs and higher-level decisions under fine-grained prompting, especially for the Technical Agent.
The system achieves competitive performance when benchmarked against market indices via standard portfolio optimization.
Analyzing intermediate texts and agent outputs improves interpretability and reliability of the trading framework.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.