QUICK REVIEW

[论文解读] Deep Reinforcement Learning for Optimum Order Execution: Mitigating Risk and Maximizing Returns

Khabbab Zakaria, Jayapaulraj Jerinsh|arXiv (Cornell University)|Jan 8, 2026

Risk and Portfolio Optimization被引用 0

一句话总结

论文提出了一种用于美国市场的深度强化学习的最佳下单执行方法，在ROI和风险管理方面通过动态适应市场条件（包括压力时期）超越VWAP和TWAP。

ABSTRACT

Optimal Order Execution is a well-established problem in finance that pertains to the flawless execution of a trade (buy or sell) for a given volume within a specified time frame. This problem revolves around optimizing returns while minimizing risk, yet recent research predominantly focuses on addressing one aspect of this challenge. In this paper, we introduce an innovative approach to Optimal Order Execution within the US market, leveraging Deep Reinforcement Learning (DRL) to effectively address this optimization problem holistically. Our study assesses the performance of our model in comparison to two widely employed execution strategies: Volume Weighted Average Price (VWAP) and Time Weighted Average Price (TWAP). Our experimental findings clearly demonstrate that our DRL-based approach outperforms both VWAP and TWAP in terms of return on investment and risk management. The model's ability to adapt dynamically to market conditions, even during periods of market stress, underscores its promise as a robust solution.

研究动机与目标

需要优化回报并在最佳下单执行中最小化风险的动机
提出一个面向美国市场执行的整体基于DRL的框架
在绩效和风险指标上评估DRL相对于VWAP和TWAP
展示在市场压力条件下DRL的鲁棒性

提出的方法

为美国市场的最佳下单执行开发一个深度强化学习模型
将DRL与VWAP和TWAP执行策略的表现进行比较
在投资回报率和风险管理方面评估结果
在市场压力和不同条件下测试模型的适应性
聚焦于对不断变化市场环境的动态适应性

实验结果

研究问题

RQ1基于DRL的优化器是否能在美国市场执行中在ROI和风险指标上超过VWAP和TWAP？
RQ2DRL模型如何在市场条件变化和市场压力期间进行自适应？
RQ3驱动DRL与传统执行策略性能差异的关键因素是什么？

主要发现

基于DRL的最佳下单执行在ROI和风险管理方面优于VWAP和TWAP
DRL模型展示了对市场条件的动态适应能力
在市场压力期该方法仍然具有鲁棒性

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。