QUICK REVIEW

[论文解读] Safe and Nested Subgame Solving for Imperfect-Information Games

Noam Brown, Tüomas Sandholm|arXiv (Cornell University)|May 8, 2017

Artificial Intelligence in Games参考文献 26被引用 60

一句话总结

该论文为不完备信息博弈引入安全性、可达性（reach）和嵌套子博弈求解技术，在提升相对于现有方法的可利用性方面有所改进，并实现大型博弈策略的实时细化，显著帮助 Libratus 在扑克中的成功。

ABSTRACT

In imperfect-information games, the optimal strategy in a subgame may depend on the strategy in other, unreached subgames. Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games. Nevertheless, it is possible to first approximate a solution for the whole game and then improve it by solving individual subgames. This is referred to as subgame solving. We introduce subgame-solving techniques that outperform prior methods both in theory and practice. We also show how to adapt them, and past subgame-solving techniques, to respond to opponent actions that are outside the original action abstraction; this significantly outperforms the prior state-of-the-art approach, action translation. Finally, we show that subgame solving can be repeated as the game progresses down the game tree, leading to far lower exploitability. These techniques were a key component of Libratus, the first AI to defeat top humans in heads-up no-limit Texas hold'em poker.

研究动机与目标

解释在不完备信息博弈中，子博弈为何不能独立求解，以及如何改进蓝图策略。
提出带有理论可对抗性保证的安全子博弈求解框架（Resolve、Maxmargin）.
引入可达子博弈求解，用以考虑跨子博弈的影响与基于赠送（gift）的价值分配。
扩展为嵌套子博弈求解，以在大型抽象博弈中实现多层次的细化。

提出的方法

定义不完备信息子博弈，并引入“赠礼/礼物”（gift）概念，用以衡量子博弈对博弈其他部分的影响程度。
描述不安全子博弈求解及其缺乏理论保证。
介绍安全子博弈求解方法：Resolving、Maxmargin及其可达变体，并给出形式化的可利用性保证（定理1）。
引入可达子博弈求解，在多个子博弈之间分配基于赠礼调整的边际收益，并确保整体安全。
结合来自抽象的反事实价值估计（CBV），以收紧实际可利用性界限（定理2）。
讨论嵌套子博弈求解，以处理多层抽象及树外动作。

实验结果

研究问题

RQ1如何在不完备信息博弈中安全地执行子博弈求解，以保证可利用性不超过蓝图？
RQ2如何扩展子博弈求解以考虑跨多个子博弈的相互作用并适应当前抽象之外的行动？
RQ3与先前方法相比，使用可达子博弈求解和嵌套子博弈求解的理论保证和实际收益是什么？

主要发现

在特定条件下，安全子博弈求解技术（Resolve、Maxmargin及其可达变体）可使可利用性不劣于蓝图（定理1）。
可达子博弈求解通过纳入来自其他子博弈的赠礼，提高鲁棒性，比孤立求解子博弈更有效地降低可利用性。
来自抽象的反事实价值估计可以收紧可利用性界限并提升实际性能（定理2）。
嵌套子博弈求解使博弈进行过程中实现迭代式细化，便于在大型不完备信息博弈中进行更细粒度的策略更新。
这些方法在 Libratus 中发挥了关键作用，Libratus 是首个在单挑无上限德州扑克中击败顶尖人类的人工智能。
可达方法提供了一种有原则的方式，在多个子博弈之间分配价值调整，以在提升性能的同时维持安全。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。