QUICK REVIEW

[論文レビュー] A Unifying Framework for Online Optimization with Long-Term Constraints

Matteo Castiglioni, Andrea Celli|arXiv (Cornell University)|Sep 15, 2022

Advanced Bandit Algorithms Research被引用数 5

ひとこと要約

本論文は、ラグランジュゲームに基づく2段階のプライマル・デュアルアプローチを用いて、長期的制約を伴うオンライン最適化の統一的枠組みを提示する。確率的および敵対的報酬および制約系列の両方において、最初の「両世界の最良」の保証—サブラインハスレグレットと制約違反—を達成し、厳密に実行可能解が存在する場合に最適報酬のρ/(1+ρ)の割合を達成する。

ABSTRACT

Many companies rely on advertising platforms such as Google, Facebook, or Instagram to recruit a large and diverse applicant pool for job openings. Prior works have shown that equitable bidding may not result in equitable outcomes due to heterogeneous levels of competition for different types of individuals. Suggestions have been made to address this problem via revisions to the advertising platform. However, it may be challenging to convince platforms to undergo a costly re-vamp of their system, and in addition it might not offer the flexibility necessary to capture the many types of fairness notions and other constraints that advertisers would like to ensure. Instead, we consider alterations that make no change to the platform mechanism and instead change the bidding strategies used by advertisers. We compare two natural fairness objectives: one in which the advertisers must treat groups equally when bidding in order to achieve a yield with group-parity guarantees, and another in which the bids are not constrained and only the yield must satisfy parity constraints. We show that requiring parity with respect to both bids and yield can result in an arbitrarily large decrease in efficiency compared to requiring equal yield proportions alone. We find that autobidding is a natural way to realize this latter objective and show how existing work in this area can be extended to provide efficient bidding strategies that provide high utility while satisfying group parity constraints as well as deterministic and randomized rounding techniques to uphold these guarantees. Finally, we demonstrate the effectiveness of our proposed solutions on data adapted from a real-world employment dataset.

研究の動機と目的

時間的に変化する任意の長期的制約のもとで、累積報酬を最大化するオンライン意思決定の課題に対処すること。
敵対的設定において、最適固定戦略をベースラインとして用いる最初のレギュレーターなしアルゴリズムを提供すること。このアルゴリズムはサブラインハス累積制約違反を達成する。
既存のオンライン凸最適化フレームワークを統合・拡張し、非凸の報酬および制約を扱えるようにすること。
レギュレーター最小化器のモジュラー統合により、フルフィードバックおよびバンディットフィードバック設定をスムーズに処理できること。
繰り返しオークションにおけるROIや公平性に基づく分布的制約といった、複雑な現実世界の制約への適用を可能にすること。

提案手法

プライマルプレーヤーとデュアルプレーヤーの間でラグランジュゲームとして問題を定式化し、プライマルプレーヤーは報酬最大化と制約違反のバランスを取る。
2段階のアルゴリズムを実装する：(1) 報酬と制約のトレードオフを最適化する「プレイフェーズ」、(2) 制約違反を防ぐために安全な意思決定を強制する「リカバリーフェーズ」。
従来のレギュレーター最小化器をブラックボックスコンponentsとして使用し、フルフィードバックおよびバンディットフィードバック設定と互換性を持つ。
厳密に実行可能解のマージンを定量化するための妥当性パrameter ρを導入し、ρに依存する性能保証を可能にする。
デュアル更新には、ネガティブエントロピー正則化を用いたオンラインミラー降下（OMD）を適用し、入札設定におけるバンディットフィードバックにはEXP3.Pを用いる。
非凸の目的関数および制約を、非凸損失に適した適切なレギュレーター最小化器を用いることで処理可能にする。

実験結果

リサーチクエスチョン

RQ1長期的制約を伴うオンライン最適化において、敵対的および確率的両設定でサブラインハスレグレットと制約違反を達成する単一のアルゴリズムを設計できるか？
RQ2一般の時間的に変化する制約のもとで、敵対的設定において最適固定戦略をベースラインとして用いる場合、どのような性能保証が達成可能か？
RQ3非凸の報酬および制約を扱えるようにしながらも、理論的保証を維持できるようにフレームワークをどのように拡張できるか？
RQ4ROIや公平性に基づく分布的制約といった複雑な制約を含む現実世界のオークションメカニズムに、このフレームワークを適用できるか？
RQ5厳密に実行可能解の存在が、敵対的ケースにおいて性能保証の向上を可能にする条件は何か？

主な発見

提案されたアルゴリズムは、敵対的設定において、レギュレーターと制約違反がサブラインハスであり、厳密に実行可能解のマージンを表すρに対して、最適報酬のρ/(1+ρ)の割合を達成する。
定数ρの確率的設定では、レギュレーターと累積制約違反の両方で、既知の最良の境界˜O(T^{1/2})を達成する。
Tに比例するスケールで変動する任意のρ（例：ρ ≤ T^{-1/4}）に対して、アルゴリズムは˜O(T^{3/4})のレギュレーターと違反を保証するが、依然としてサブラインハスである。
確率的ケースではリカバリーフェーズが一切発動しないことが保証され、これは、バジェットペーシングメカニズムが過剰に慎重な入札を避けるために不可欠である。
このフレームワークは、第一価格オークションにおけるROI制約を処理できるようにインスタンス化可能であり、厳密に実行可能解が存在する場合、予算およびROI制約の両方で˜O(T^{1/2})の累積違反を達成する。
このフレームワークにより、繰り返しオークションにおける公平性制約を明示的に処理でき、平均インプレッション配分が各カテゴリごとに˜O(T^{-1/2})の誤差内で目標に収束することが保証される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。