QUICK REVIEW

[论文解读] On a General Dynamic Programming Approach for Decentralized Stochastic Control

Serdar Yüksel|arXiv (Cornell University)|Mar 15, 2018

Economic theories and models被引用 1

一句话总结

本文提出了一种针对具有标准Borel空间的去中心化随机控制问题的一般动态规划框架，采用基于可测战略测度的新状态-动作公式。该框架在温和的连续性和紧致性条件下建立了适定性，并证明了最优团队策略的存在性结果，尤其适用于具有独立观测的静态团队以及可通过静态化变换处理的动态团队。

ABSTRACT

For sequential stochastic control problems with standard Borel measurement and control action spaces, we introduce a very general dynamic programming formulation, establish its well-posedness, and provide new existence results for optimal policies. Our dynamic program builds in part on Witsenhausen's standard form, but with a different formulation for the state, action, and transition dynamics. Using recent results on measurability properties of strategic measures in decentralized control, we obtain a controlled Markov model with standard Borel state and state dependent action sets. This allows for a well-posed formulation for the controlled Markov model for a general class of sequential decentralized stochastic control in that it leads to well-defined dynamic programming recursions through universal measurability properties of the value functions for each time stage. Through this formulation, new existence results are obtained for optimal team policies in decentralized stochastic control. These state that for a static team with independent measurements, it suffices for the cost function to be continuous in the actions for the existence of an optimal policy under mild compactness conditions. These also apply to dynamic teams which admit static reductions with independent measurements through a change of measure transformation. We show through a counterexample that weaker conditions may not lead to existence of an optimal team policy. In particular, the paper presents existence results which complement and generalize those previously reported.

研究动机与目标

为具有标准Borel空间的顺序去中心化随机控制问题开发一个适定的动态规划公式。
在温和条件下建立去中心化随机控制中最优团队策略存在的结果。
通过利用可测战略测度和与状态相关的动作集，推广先前的存在性结果。
识别出最优策略存在的最小条件，特别是针对具有独立观测的静态团队。
通过反例表明，若连续性或紧致性条件减弱，则可能无法保证最优策略的存在。

提出的方法

使用具有标准Borel状态和与状态相关的动作集的受控马尔可夫模型来表述问题。
以不同于Witsenhausen标准形式的方式重新表述状态、动作和转移动态，以确保价值函数的普遍可测性。
利用近期在去中心化控制中可测战略测度的可测性性质的研究成果，以确保适定性。
通过在每个时间阶段价值函数的普遍可测性建立动态规划递推关系。
应用测度变换方法，将某些动态团队问题转化为具有独立观测的静态团队问题。
通过在动作上对代价函数的连续性以及温和的紧致性假设，证明最优策略的存在性。

实验结果

研究问题

RQ1在具有标准Borel空间的去中心化随机控制问题中，最优团队策略存在的条件是什么？
RQ2能否构建一个一般性的动态规划公式，以确保广泛类别的去中心化随机控制问题的适定性？
RQ3对代价函数和动作空间的最小假设是什么，才能保证具有独立观测的静态团队中最优策略的存在？
RQ4与状态相关的动作集和可测战略测度如何促进动态规划的适定性？
RQ5当连续性或紧致性条件在存在性证明中被削弱时，会引发哪些局限性？

主要发现

通过可测战略测度，为一类广义的顺序去中心化随机控制问题建立了适定的动态规划公式。
在每个时间阶段的价值函数均为普遍可测，从而支持有效的动态规划递推。
对于具有独立观测的静态团队，代价函数在动作上的连续性以及温和的紧致性条件足以保证最优策略的存在。
该结果可推广至可通过测度变换实现静态化还原的动态团队。
通过反例表明，若连续性与紧致性条件弱于当前要求，则可能无法保证最优策略的存在。
研究结果推广并补充了现有去中心化随机控制中的存在性结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。