QUICK REVIEW

[论文解读] Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

Saeed Masiha, Sepehr Elahi|arXiv (Cornell University)|Feb 26, 2026

Game Theory and Applications被引用 0

一句话总结

提出 ZO-Stackelberg，一种零阶双层优化方法，通过将 Frank–Wolfe 均衡求解器与零阶外部更新相结合，在组合拥塞博弈中优化领导者参数，避免对均衡求导。

ABSTRACT

We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish users choose discrete routes (or other combinatorial strategies) and settle at a congestion equilibrium. The leader minimizes a system-level objective (e.g., total travel time) evaluated at equilibrium, but this objective is typically nonsmooth because the set of used strategies can change abruptly. We propose ZO-Stackelberg, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria. We prove convergence to generalized Goldstein stationary points of the true equilibrium objective, with explicit dependence on the equilibrium approximation error, and analyze subsampled oracles: if an exact minimizer is sampled with probability $κ_m$, then the Frank--Wolfe error decays as $\mathcal{O}(1/(κ_m T))$. We also propose stratified sampling as a practical way to avoid a vanishing $κ_m$ when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

研究动机与目标

通过对领导者参数（如通行费、容量、激励）进行调参来引导跟随者的离散路径选择，在组合拥塞博弈中实现优化。
在 Wardrop 平衡处评估系统层次目标，即使该目标因活动集变化而可能非光滑。
提供对真实均衡目标的广义 Goldstein 静止点的收敛性保证。
开发实际、可扩展的算法，避免对均衡的微分，同时保持准确性。

提出的方法

将一个无投影的 Frank–Wolfe 均衡求解器与领导者参数 theta 的零阶外部更新耦合。
将内循环设为 FW，以线性极小化公约 (LMO) 在可行载荷多面体 C 上近似 y*(theta)，可选使用子采样的 LMO_m。
通过对 Phi_hat_T(theta)=F(theta,y_T(theta)) 的评估，使用两点有限差分的零阶方案更新 theta。
使用分层或结构感知采样以提高 sampled LMO 包含精确极小值的概率 kappa_m，从而实现更快的收敛。
证明收敛到真实且可能非光滑的超目标 Phi 的广义 Goldstein 静止点，并对内在均衡误差给出显式依赖。
提供一个高效的 Python 实现，针对各种策略族（如 s–t 路径、Hamiltonian 路径、Steiner 循环）提供精确和子采样 LMO。

Figure 1 : Leader objective vs outer iterations for Scenarios 1–3. For subsampled LMOs (US/UL/HL), lighter shades denote smaller sampling budgets $m$ (we use $m\in\{10,100,1000\}$ in Scenario 2 and 3); bands are 99% CIs over 10 runs, while Diff is deterministic.

实验结果

研究问题

RQ1当均衡映射因活动集变化而非光滑时，零阶双层方法是否能收敛到有意义的静止点？
RQ2LMO 的子采样如何影响收敛速率，分层抽样是否能缓解巨大组合策略空间带来的问题？
RQ3在不对均衡求导的情况下优化真实均衡目标，是否能带来可观的速度提升并达到与基于微分的基线相近的精度？
RQ4ZO-Stackelberg 在组合拥塞博弈中的收敛性保证及速率对内在近似误差的依赖有哪些？
RQ5不同的组合策略族如何影响 LMO 的实现及在真实网络中的整体性能？

主要发现

ZO-Stackelberg 在精度上与基于微分的基线相匹配，同时实现数量级级别的速度提升和更低的内存消耗。
带有 LMO 的内循环（精确或子采样）在温和的优化器命中假设下提供收敛保证，收敛速率为 O(1/(kappa_m T))。
外循环收敛到近似均衡目标 Phi 的广义 Goldstein 静止点，并对内在均衡误差给出明确依赖。
分层抽样（如按长度去偏）在大规模策略空间中保持 kappa_m 的非平凡性，从而提升实际性能。
对于某些策略族（如 s–t 路径、Hamiltonian 路径），精确 LMOs 可以通过基于 ZDD 的动态规划实现；在不可行时，子采样可以替代精确极小化。

Figure 2 : Final-iterate diagnostics: speedup vs Diff, peak RSS, FW gap, and social cost, for Scenarios 1–3. For subsampling-based variants, lighter shades denote smaller $m$ (same $m$ as in Figure 1 ); points are means and bars are 99% CIs over 10 runs.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。