[论文解读] A Unifying Framework for Online Optimization with Long-Term Constraints
本文提出了一种用于长期约束在线优化的统一框架,基于拉格朗日博弈的两阶段原始-对偶方法。在奖励和约束序列均为随机和对抗性的情况下,该框架首次实现了最佳两者兼顾的保证——次线性遗憾和约束违反,当存在严格可行解时,可获得最优奖励的 ρ/(1+ρ) 比例。
Many companies rely on advertising platforms such as Google, Facebook, or Instagram to recruit a large and diverse applicant pool for job openings. Prior works have shown that equitable bidding may not result in equitable outcomes due to heterogeneous levels of competition for different types of individuals. Suggestions have been made to address this problem via revisions to the advertising platform. However, it may be challenging to convince platforms to undergo a costly re-vamp of their system, and in addition it might not offer the flexibility necessary to capture the many types of fairness notions and other constraints that advertisers would like to ensure. Instead, we consider alterations that make no change to the platform mechanism and instead change the bidding strategies used by advertisers. We compare two natural fairness objectives: one in which the advertisers must treat groups equally when bidding in order to achieve a yield with group-parity guarantees, and another in which the bids are not constrained and only the yield must satisfy parity constraints. We show that requiring parity with respect to both bids and yield can result in an arbitrarily large decrease in efficiency compared to requiring equal yield proportions alone. We find that autobidding is a natural way to realize this latter objective and show how existing work in this area can be extended to provide efficient bidding strategies that provide high utility while satisfying group parity constraints as well as deterministic and randomized rounding techniques to uphold these guarantees. Finally, we demonstrate the effectiveness of our proposed solutions on data adapted from a real-world employment dataset.
研究动机与目标
- 解决在任意、时变长期约束下进行在线决策的挑战,同时最大化累积奖励。
- 在对抗性设置下,首次提出一种无遗憾算法,实现次线性累积约束违反,以最优固定策略作为基线。
- 统一并扩展现有在线凸优化框架,以处理非凸奖励和约束。
- 通过模块化集成遗憾最小化器,实现对完整反馈和bandit反馈设置的无缝处理。
- 将适用范围扩展至复杂现实约束,如重复拍卖中的ROI和基于公平性的分布约束。
提出的方法
- 将问题建模为原始玩家与对偶玩家之间的拉格朗日博弈,其中原始玩家在奖励最大化与约束违反之间进行权衡。
- 实施两阶段算法:(1) 播放阶段,优化奖励与约束的权衡;(2) 恢复阶段,强制执行安全决策以防止约束违反。
- 使用传统遗憾最小化器作为黑箱组件,实现与完整反馈和bandit反馈设置的兼容性。
- 引入可行性参数 ρ 以量化严格可行解的裕量,从而实现依赖于 ρ 的性能保证。
- 对偶更新采用带负熵正则化的在线镜像下降(OMD),在出价设置中使用 EXP3.P 处理bandit反馈。
- 通过采用适用于非凸损失的适当遗憾最小化器,支持非凸目标和约束。
实验结果
研究问题
- RQ1我们能否设计一种单一算法,在长期约束在线优化的对抗性和随机设置下,同时实现次线性遗憾和约束违反?
- RQ2在使用最优固定策略作为基线、且约束为一般时变的情况下,对抗性设置下可实现何种性能保证?
- RQ3如何扩展该框架以处理非凸奖励和约束,同时保持理论保证?
- RQ4该框架能否应用于具有复杂约束(如ROI和基于公平性的分布约束)的实际拍卖机制?
- RQ5在对抗性情况下,严格可行解的存在在何种条件下可实现改进的性能保证?
主要发现
- 所提出的算法在对抗性设置下,以 ρ 为可行性裕量,实现了最优奖励的 ρ/(1+ρ) 比例,且遗憾和约束违反均为次线性。
- 在常数 ρ 的随机设置下,算法在遗憾和累积约束违反方面均达到目前已知最优的 ˜O(T^{1/2}) 边界。
- 对于可能随 T 变化的任意 ρ(例如 ρ ≤ T^{-1/4}),算法保证 ˜O(T^{3/4}) 的遗憾和违反,仍保持次线性。
- 该框架确保在随机情况下,恢复阶段永远不会被触发,这对预算 pacing 机制避免过度保守出价至关重要。
- 该框架可实例化以处理第一价格拍卖中的 ROI 约束,在存在严格可行解时,对预算和 ROI 约束均实现 ˜O(T^{1/2}) 的累积违反。
- 该框架可显式处理重复拍卖中的公平性约束,保证各类别的平均展示分布收敛至目标值,误差在 ˜O(T^{-1/2}) 以内。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。