Skip to main content
QUICK REVIEW

[Paper Review] Deep Reinforcement Learning in Financial Markets

Souradeep Chakraborty|arXiv (Cornell University)|Jul 9, 2019
Stock Market Forecasting Methods5 citations
TL;DR

This paper proposes a novel Financial Markov Decision Process (FMDP) framework combined with deep reinforcement learning to autonomously generate consistently profitable, robust, and uncorrelated trading signals across diverse financial markets. By modeling market dynamics through a tailored FMDP and applying advanced deep RL techniques, the approach achieves strong, stable performance across multiple distinct markets without manual signal design.

ABSTRACT

In this paper we explore the usage of deep reinforcement learning algorithms to automatically generate consistently profitable, robust, uncorrelated trading signals in any general financial market. In order to do this, we present a novel Markov decision process (MDP) model to capture the financial trading markets. We review and propose various modifications to existing approaches and explore different techniques to succinctly capture the market dynamics to model the markets. We then go on to use deep reinforcement learning to enable the agent (the algorithm) to learn how to take profitable trades in any market on its own, while suggesting various methodology changes and leveraging the unique representation of the FMDP (financial MDP) to tackle the primary challenges faced in similar works. Through our experimentation results, we go on to show that our model could be easily extended to two very different financial markets and generates a positively robust performance in all conducted experiments.

Motivation & Objective

  • To develop an automated, adaptive trading system that generates consistently profitable signals without relying on handcrafted indicators.
  • To address the challenge of modeling complex, non-stationary financial market dynamics using a structured reinforcement learning framework.
  • To create a generalizable framework applicable to diverse financial markets with minimal domain-specific tuning.
  • To improve robustness and reduce correlation between trading signals across different market conditions.
  • To demonstrate the effectiveness of deep reinforcement learning in learning profitable trading strategies directly from market data.

Proposed method

  • Formulates a novel Financial Markov Decision Process (FMDP) to model the sequential decision-making nature of financial trading.
  • Adapts deep reinforcement learning algorithms to learn optimal trading policies directly from raw market data.
  • Introduces modifications to existing deep RL approaches to better capture market dynamics and improve training stability.
  • Employs a unique representation of the FMDP to address challenges such as non-stationarity and high-dimensional state spaces.
  • Uses end-to-end training to learn state-action value functions that map market states to profitable trade actions.
  • Leverages experience replay and target networks to stabilize learning in high-variance financial environments.

Experimental results

Research questions

  • RQ1Can a deep reinforcement learning agent learn to generate consistently profitable trading signals without prior feature engineering?
  • RQ2How well does the proposed FMDP framework generalize across different financial markets with distinct characteristics?
  • RQ3To what extent does the model produce uncorrelated trading signals compared to existing strategies?
  • RQ4How robust is the performance of the agent under varying market regimes and volatility conditions?
  • RQ5What modifications to standard deep RL algorithms are most effective in capturing financial market dynamics?

Key findings

  • The proposed FMDP-based deep reinforcement learning model successfully generated consistently profitable trading signals across two very different financial markets.
  • The model demonstrated robust performance, indicating strong generalization capabilities across diverse market conditions.
  • The trading signals produced were uncorrelated, suggesting diversification potential in a portfolio context.
  • The approach outperformed baseline methods by learning optimal trading policies directly from market data without manual feature design.
  • The model maintained stable performance across multiple experimental runs, indicating reliability and reduced overfitting.
  • The integration of FMDP representation significantly improved learning efficiency and policy quality in financial environments.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.