QUICK REVIEW

[论文解读] Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis

Wenhang Bao, Xiaoyang Liu|arXiv (Cornell University)|Jun 24, 2019

Financial Markets and Investment Strategies被引用 27

一句话总结

本文提出了一种多智能体深度强化学习框架，通过在动态市场中建模交易者之间的互动，优化股票抛售策略。该研究将Almgren-Chriss模型扩展至多智能体环境，表明竞争行为会降低个体及整体表现，而合作策略也无法超越独立交易的表现，凸显了在真实抛售分析中采用多智能体强化学习的必要性。

ABSTRACT

Liquidation is the process of selling a large number of shares of one stock sequentially within a given time frame, taking into consideration the costs arising from market impact and a trader's risk aversion. The main challenge in optimizing liquidation is to find an appropriate modeling system that can incorporate the complexities of the stock market and generate practical trading strategies. In this paper, we propose to use multi-agent deep reinforcement learning model, which better captures high-level complexities comparing to various machine learning methods, such that agents can learn how to make the best selling decisions. First, we theoretically analyze the Almgren and Chriss model and extend its fundamental mechanism so it can be used as the multi-agent trading environment. Our work builds the foundation for future multi-agent environment trading analysis. Secondly, we analyze the cooperative and competitive behaviours between agents by adjusting the reward functions for each agent, which overcomes the limitation of single-agent reinforcement learning algorithms. Finally, we simulate trading and develop an optimal trading strategy with practical constraints by using a reinforcement learning method, which shows the capabilities of reinforcement learning methods in solving realistic liquidation problems.

研究动机与目标

为解决单智能体强化学习在捕捉大规模股票抛售过程中动态、交互式市场行为方面的局限性。
将Almgren-Chriss最优抛售模型扩展至多智能体环境，以更好地反映现实市场的复杂性。
分析智能体之间的合作与竞争关系对整体及个体抛售表现的影响。
在模拟的多智能体交易环境中，利用深度强化学习开发实用且自适应的抛售策略。
证明多智能体强化学习在建模真实市场互动与成本结构方面优于单智能体方法。

提出的方法

将Almgren-Chriss模型扩展至多智能体设置，通过包含持仓量、价格冲击和市场冲击的动态状态形式化抛售问题。
采用深度确定性策略梯度（DDPG）作为多智能体框架中连续动作空间的底层强化学习算法。
设计奖励函数以模拟合作与竞争的智能体行为，支持对策略互动的分析。
在模拟的多智能体环境中实现，其中智能体通过在具有价格冲击的动态市场中试错学习最优抛售轨迹。
使用包含持仓量、时间及市场冲击参数的状态向量来表示环境的动态状态。
采用经验回放与目标网络训练智能体以稳定学习过程，在演员-critic架构中分别使用独立的策略网络与价值网络。

实验结果

研究问题

RQ1在多个具有共同目标的智能体参与下，与单智能体设置相比，抛售的效率与成本如何变化？
RQ2在多智能体抛售环境中，合作与竞争行为对性能有何影响？
RQ3多智能体深度强化学习能否学习到适应市场中其他交易者存在的最优抛售策略？
RQ4奖励函数设计如何影响模拟抛售环境中智能体行为与整体系统性能？
RQ5多智能体强化学习在捕捉真实市场动态方面，相较于传统单智能体强化学习与Almgren-Chriss等分析模型，优势体现在多大程度上？

主要发现

智能体之间的竞争行为导致预期不足总和显著上升——比独立或合作设置高出20%以上，表明所有智能体的表现均下降。
在竞争场景中，一个智能体学会在第1天抛售全部股份，迫使另一智能体承担大部分价格冲击成本，导致个体与整体执行成本上升。
当引入竞争者时，最优抛售轨迹发生剧烈变化：原本在独立训练下需20天完成抛售的智能体，为避免市场冲击，改在前两天内全部卖出。
合作策略并未优于独立训练，表明在此多智能体设置中，相互协调无法带来更优结果。
多智能体环境成功捕捉了交易者的战略依存性，揭示了竞争会导致所有参与方陷入次优结果。
尽管设置简化，强化学习智能体仍能动态适应竞争者行为，展现出该框架对复杂市场互动建模的能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。