QUICK REVIEW

[论文解读] Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

Yun Kuen Cheung, Georgios Piliouras|arXiv (Cornell University)|Jun 25, 2019

Advanced Bandit Algorithms Research被引用 25

一句话总结

本文表明，在零和博弈的在线学习动态中，包括常数步长的乘法权重更新（MWU），在对偶（收益）空间中表现出李雅普诺夫混沌，尽管时间平均收敛于近似纳什均衡，但长期行为仍不可预测。这种混沌在各类FTRL算法、不同步长以及广义博弈结构中持续存在，挑战了经典理论中关于最大最小均衡化的预测。

ABSTRACT

We establish that algorithmic experiments in zero-sum games fail miserably to confirm the unique, sharp prediction of maxmin equilibration. Contradicting nearly a century of economic thought that treats zero-sum games nearly axiomatically as the exemplar symbol of economic stability, we prove that no meaningful prediction can be made about the day-to-day behavior of online learning dynamics in zero-sum games. Concretely, Multiplicative Weights Updates (MWU) with constant step-size is Lyapunov chaotic in the dual (payoff) space. Simply put, let's assume that an observer asks the agents playing Matching-Pennies whether they prefer Heads or Tails (and by how much in terms of aggregate payoff so far). The range of possible answers consistent with any arbitrary small set of initial conditions blows up exponentially with time everywhere in the payoff space. This result is robust both algorithmically as well as game theoretically: 1) Algorithmic robustness: Chaos is robust to agents using any of a general sub-family of Follow-the-Regularized-Leader (FTRL) algorithms, the well known regret-minimizing dynamics, even when agents mix-and-match dynamics, use different or slowly decreasing step-sizes. 2) Game theoretic robustness: Chaos is robust to all affine variants of zero-sum games (strictly competitive games), network variants with arbitrary large number of agents and even to competitive settings beyond these. Our result is in stark contrast with the time-average convergence of online learning to (approximate) Nash equilibrium, a result widely reported as (weak) convergence to equilibrium.

研究动机与目标

挑战经济学中长期存在的假设，即在在线学习下，零和博弈会趋于最大最小均衡。
探究零和博弈中的在线学习动态是否表现出可预测的均衡行为，或混沌轨迹。
评估混沌在不同学习算法和博弈结构（包括仿射变体和网络化设置）下的鲁棒性。
调和时间平均收敛至纳什均衡与缺乏稳定、可预测的日常动态之间的明显矛盾。

提出的方法

通过分析收益向量的对偶空间，研究零和博弈中在线学习动态的敏感性。
证明常数步长的乘法权重更新（MWU）在收益空间中表现出李雅普诺夫混沌。
将混沌结果扩展至FTRL算法的一般子族，包括混合动态和不同步长。
证明混沌在零和博弈的所有仿射变换下均保持不变，包括严格竞争博弈。
将分析扩展至具有任意参与人数和更广泛竞争环境的网络化零和博弈。
运用动力系统理论，形式化描述从任意微小初始条件扰动出发的轨迹指数发散。

实验结果

研究问题

RQ1零和博弈中的在线学习是否收敛至稳定均衡，还是在收益空间中表现出混沌行为？
RQ2常数步长的MWU混沌行为对学习算法和步长调度变化的鲁棒性如何？
RQ3当参与者使用不同FTRL变体或混合学习规则时，混沌动态是否仍能持续？
RQ4在线学习动态中的混沌在博弈结构的仿射变换下是否保持不变？
RQ5混沌的存在如何与众所周知的时间平均收敛至近似纳什均衡相调和？

主要发现

常数步长的乘法权重更新（MWU）在对偶（收益）空间中表现出李雅普诺夫混沌，意味着从任意微小扰动出发的轨迹会指数发散。
该混沌行为在所有FTRL算法中均具有鲁棒性，包括具有混合动态、不同步长和缓慢递减步长的算法。
混沌在零和博弈的所有仿射变换下均持续存在，包括严格竞争博弈，显示出博弈论层面的鲁棒性。
混沌在具有任意数量参与者的网络化零和博弈中同样具有鲁棒性，表明其适用范围远超双人博弈场景。
尽管缺乏稳定均衡，时间平均收敛至近似纳什均衡的现象依然存在，从而在长期平均与短期不可预测性之间形成悖论。
与任意小初始条件集合一致的收益空间结果范围随时间呈指数增长，使得日常预测成为不可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。