QUICK REVIEW

[论文解读] Swap Regret Minimization Through Response-Based Approachability

Ioannis Anagnostides, Gabriele Farina|arXiv (Cornell University)|Feb 5, 2026

Advanced Bandit Algorithms Research被引用 0

一句话总结

引入一种计算高效的算法，在一般凸集上使用基于响应的可接近性框架以最小化线性交换后悔，并达到 O(d√T) 的后悔界且与下界匹配，同时也最小化轮廓交换后悔。

ABSTRACT

We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '25). However, it incurs a highly suboptimal regret bound of $Ω(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d^{3/2} \sqrt{T})$ linear swap regret for a general convex set and $O(d \sqrt{T})$ when the set is centrally symmetric. Our approach leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR '15) -- previously overlooked in the line of work on swap regret minimization -- combined with geometric preconditioning via the John ellipsoid. Our algorithm simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $Ω(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML '08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.

研究动机与目标

在线优化中需要更强的后悔概念（交换后悔）的动机及其与相关均衡和不可操纵性的联系。
开发一个在一般凸集合上最小化线性交换后悔的计算高效算法。
证明该算法也最小化轮廓交换后悔，确保不可操纵性。
提供匹配的信息论下界，并讨论对多项式维度的交换偏差的扩展。

提出的方法

将线性交换后悔降维为使用最佳反应、凸包 K 与目标集合 S 的可接近性问题。
应用 Bernstein 与 Shimkin (2015) 的基于响应的可接近性算法，并通过预处理步骤将策略集置于 John’s 位置。
引入几何预条件化（John’s 位置），以界定自同态和可接近空间的 Frobenius 范数。
在预条件化下证明线性交换后悔的 O(d√T) 上界（算法2：预条件化的基于响应的可接近性）。
在多项式维度上使用混合策略将框架扩展到交换偏差（算法3）。
建立匹配的下界，表明在最坏情况下 Ω(d√T) 的线性交换后悔是不可避免的（定理 5.1）。

实验结果

研究问题

RQ1在线优化中，在线集合不仅限于简单形状，是否也能高效最小化线性交换后悔？
RQ2对于带预条件化的线性交换后悔，能够达到的紧界是多少，是否在信息论意义上最优？
RQ3该方法能否扩展到多项式维度的交换偏差，同时保持计算效率？
RQ4线性和轮廓交换后悔的最小化与对抗性对手的不可操纵性有何关系？
RQ5现有方法（如椭圆法等）有哪些局限性，所提出的方法与之相比如何？

主要发现

在 John’s 预条件化后，通用于一般凸集的线性交换后悔达到 O(d√T) 的线性后悔界。
对于较大 T，甚至对中心对称的 P，也存在 Ω(d√T) 的信息论下界，意味着在常数因子级别上最优。
经典的 Gordon 等人算法在信息论意义上对线性交换后悔是最优的，但计算上效率较低；新方法在实际效率上显著提升。
该方法也最小化轮廓交换后悔，有助于提高对自适应对手的不可操纵性。
该框架可扩展到多项式维度的交换偏差，提供了相比以往工作更好的 PolyDimSwapRegT 上界。
下界构造使用积集合 P = B1 × B∞，以展示在对抗性损失下的不可避免的后悔增长。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。