QUICK REVIEW

[論文レビュー] Qualitative Analysis of Concurrent Mean-payoff Games

Krishnendu Chatterjee, Rasmus Ibsen-Jensen|arXiv (Cornell University)|Jan 1, 2013

Logic, Reasoning, and Knowledge参考文献 40被引用数 1

ひとこと要約

本稿は、並行型平均報酬ゲームの定性的分析を提示し、定性的な決定性、最適戦略の複雑さ、およびほぼ確実かつ正の勝利集合を計算するための2次時間アルゴリズムを確立する。本稿は、このようなゲームにおける定量的制約を解くことは、長年の未解決問題である、決定的並行型平均報酬ゲームを多項式時間で解く問題を解決することを意味することを示している。

ABSTRACT

We consider concurrent games played by two-players on a finite-state graph, where in every round the players simultaneously choose a move, and the current state along with the joint moves determine the successor state. We study a fundamental objective, namely, mean-payoff objective, where a reward is associated to each transition, and the goal of player 1 is to maximize the long-run average of the rewards, and the objective of player 2 is strictly the opposite. The path constraint for player 1 could be qualitative, i.e., the mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative, i.e., a given threshold between the minimal and maximal reward. We consider the computation of the almost-sure (resp. positive) winning sets, where player 1 can ensure that the path constraint is satisfied with probability 1 (resp. positive probability). Our main results for qualitative path constraints are as follows: (1) we establish qualitative determinacy results that show that for every state either player 1 has a strategy to ensure almost-sure (resp. positive) winning against all player-2 strategies, or player 2 has a spoiling strategy to falsify almost-sure (resp. positive) winning against all player-1 strategies; (2) we present optimal strategy complexity results that precisely characterize the classes of strategies required for almost-sure and positive winning for both players; and (3) we present quadratic time algorithms to compute the almost-sure and the positive winning sets, matching the best known bound of algorithms for much simpler problems (such as reachability objectives). For quantitative constraints we show that a polynomial time solution for the almost-sure or the positive winning set would imply a solution to a long-standing open problem (the value problem for turn-based deterministic mean-payoff games) that is not known to be solvable in polynomial time.

研究の動機と目的

並行型平均報酬ゲームにおけるほぼ確実および正の勝利条件の下での定性的決定性を確立すること。
両プレイヤーがほぼ確実および正の勝利戦略を達成するために必要な正確な戦略の複雑さを特定すること。
到達可能性問題で知られている最良の境界に一致する、ほぼ確実および正の勝利集合を効率的に計算するアルゴリズムを開発すること。
並行型平均報酬ゲームにおける定量的経路制約の計算的困難性を調査すること。
スケーリングおよびシフト技術を用いて、ブール値報酬から有理数値報酬関数へと結果を拡張すること。

提案手法

元の遷移1回につき3Mステップをシミュレートするガジェットベースの構成を用いて、並行型平均報酬ゲーム（DMPGs）を段階的確率的ゲームに還元する。
長期間平均報酬を分析するために、マーカフ連鎖の性質を活用し、特に閉じた再発的集合と期待平均報酬に注目する。
還元された段階的確率的ゲームにおける位置戦略を用いて、元の並行ゲームにおける戦略を導出する。
マーカフ連鎖の基本的性質を応用し、還元ゲームにおける平均報酬と元のゲームのサイクル行動との関係を確立する。
報酬スケーリングと閾値変換を用いて、元のゲームと還元ゲームにおける勝利条件の同等性を証明する。
並行ゲームにおける定量的勝利集合を解くことは、段階的決定的平均報酬ゲームを多項式時間で解くという長年の未解決問題を解けることを示唆する。

実験結果

リサーチクエスチョン

RQ1並行型平均報酬ゲームにおいて、ほぼ確実および正の勝利条件の下で定性的決定性が成立するか？
RQ2並行型平均報酬ゲームにおけるほぼ確実および正の勝利戦略に必要な正確な戦略の複雑さは何か？
RQ3ほぼ確実および正の勝利集合は2次時間で計算可能か？これは到達可能性ゲームにおける最良の既知の境界に一致するか？
RQ4並行型平均報酬ゲームにおける定量的経路制約を解くことは、段階的決定的平均報酬ゲームにおける価値問題を解くことと計算的に同等か？
RQ5有理数値報酬関数をブール値報酬に還元するには、どのようにスケーリングとシフトを適用すれば、定性的勝利条件を保持できるか？

主な発見

定性的決定性が成立：すべての状態について、プレイヤー1がプレイヤー2のいかなる戦略に対してもほぼ確実または正の勝利を保証する戦略を持つ、あるいはプレイヤー2が妨害戦略を持つ。
ほぼ確実および正の勝利集合は2次時間で計算可能であり、これは並行ゲームにおける到達可能性目的の最良の既知の複雑さに一致する。
両プレイヤーの戦略の複雑さは正確に特定されている：定性的制約下では、位置戦略がほぼ確実および正の勝利に十分である。
並行型DMPGsから段階的確率的ゲームへの還元は、3Mステップのシミュレーションを通じて勝利条件を保持しており、マーカフ連鎖の性質を用いた解析を可能にする。
並行型平均報酬ゲームにおける定量的勝利集合を解くことは、段階的決定的平均報酬ゲームを多項式時間で解くという長年の未解決問題を解けることを示唆する。
スケーリングとシフトを用いることで、ブール値報酬から有理数値報酬への結果の拡張が可能であり、最大報酬目的における定性的勝利条件を保持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。