QUICK REVIEW

[논문 리뷰] Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

Tuo Zhang, Leonardo Stella|arXiv (Cornell University)|2026. 02. 09.

Stochastic Gradient Optimization Techniques인용 수 0

한 줄 요약

이 논문은 Brown-von Neumann-Nash (BNN) 역학을 재용도하여 규제 없이 제로합 게임에서 마지막-반복 수렴을 달성하고, 반사실 가중치를 통해 확장형 게임으로 확장하며, 신경 기반 액터–크리틱 구현(BNNAC)을 사용해 규제 기반 방법보다 실험적으로 우수하다는 것을 보인다.

ABSTRACT

Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned--a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.

연구 동기 및 목표

정규화 없이 노이즈가 있는 피드백을 갖는 제로섬 게임에서 마지막-반복 수렴의 필요성에 대한 동기를 제시한다.
정규화 없는 수렴을 보장하는 BNN 기반 MAL 프레임워크를 normal-form 및 extensive-form 게임에서 introduce한다.
프레임워크를 실제로 구현하기 위한 확장 가능한 신경 기반 액터–크리틱 알고리즘(BNNAC)을 개발한다.
동적이고 비정상적인 설정에서 정규화 기반 방법에 비해 실험적 우위를 시연한다.

제안 방법

Conventional을 통한 정규화 없이 수렴을 보장하기 위해 Brown-von Neumann-Nash (BNN) 역학을 채택한다.
언샤인된 노이즈를 포함한 편향 없는 노이즈 및 분산의 경계 within 확률적 근사 프레임워크에서 보정한다.
확대된 형태 게임(extensive-form games)에 반사실( counterfactual ) 및 도달 가중치 업데이트를 사용하여 BNN 역학을 확장한다.
BNN 역학을 인구 수준으로 근사하는 신경 기반 액터–크리틱 아키텍처를 개발한다(BNNAC).
상대방의 도달 확률을 추정하고 업데이트 규칙에 통합하기 위해 별도의 도달 네트워크를 도입한다.
노이즈 바닥 및 수렴 속도 분석을 포함하여 정상형 및 확장형 설정에서 드리프트 및 수렴 특성을 입증한다.

(a) NashConv metric in the nonstationary RPS.

실험 결과

연구 질문

RQ1노이즈 피드백이 있는 두 명의 상대 제로합 게임에서 정규화 없이 BNN 역학이 마지막-수렴을 제공할 수 있는가?
RQ2수렴 보장을 유지하는 방식으로 BNN 역학을 확장형 게임(extensive-form games)에 적용할 수 있는가?
RQ3NFGs와 EFGs에서 대규모로 BNN 역학을 근사하는 신경 기반 액터–크리틱 구현은 가능한가?
RQ4비정상적인 보상 체계에서 규제 기반 방법에 비해 BNN 기반 접근법의 성능은 어떠한가?
RQ5확률적 피드백 하에서 이론적 수렴 속도와 노이즈 바닥 특성은 무엇인가?

주요 결과

BNN 역학은 정상형 제로합 게임에서 규제 없이 마지막-수렴을 달성한다.
프레임워크는 반사실 가중치를 통해 확장형 게임으로 확장되며 수렴 보장을 유지한다.
BNNAC 알고리즘은 신경 함수 근사를 통해 이론적 예측과 대규모 게임에 대한 확장성을 일치시킨다.
실험 결과는 비정상화(nonstationarity) 하에서 규제 기반 RD 방법보다 더 빠른 적응과 더 안정적인 수렴을 보여준다.
수렴은 O(σ) 노이즈 바닥과 O(σ^2) 중심점 이동, 무잡음 및 노이즈 있는 설정에서 t^{-2/3} 감소를 보인다.
이 방법은 규제 기반 방법의 하이퍼파라미터 조정 부담을 피하면서도 견고함을 유지한다.

(b) Representative trajectories and their convergence behavior.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.