QUICK REVIEW

[論文レビュー] Fairness in Learning: Classic and Contextual Bandits

Matthew Joseph, Michael Kearns|arXiv (Cornell University)|May 23, 2016

Advanced Bandit Algorithms Research参考文献 16被引用数 192

ひとこと要約

本論文は contextual bandits における個別フェアネスを定義し、公平性と学習性能の根本的なトレードオフを示す。これには厳密な後悔界と KWIK-Fairness とのつながりが含まれる。

ABSTRACT

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms

研究の動機と目的

意思決定が個人に影響を与える逐次決定問題における公平性の研究を動機づける。
コンテキストバンディットに対する正確な個人フェアネス基準を定義する。
古典的およびコンテキスト設定の双方で公平性を強制する学習コストを特徴づける。

提案手法

低平均アームを高平均アームよりも優遇することを禁ずる、バンディットアルゴリズムの delta-fairness の概念を導入する。
対ペアフェアネスを強制する連鎖信頼区間を用いた UCB の公正な変種である FairBandits を提案する。
古典設定において、アーム数の三乗に依存する後悔界を FairBandits のために導出する。
Omega(k^3) ラウンドに渡って任意の公正アルゴリズムが一定の毎ラウンド後悔を被ることを示す厳密な下界を確立する。
KWIK–fairness の関係を示し、KWIK 学習と公正なコンテキストバンディットの間の縮約を可能にする。
線形コンテキストバンディットに対して次元の多項式的依存を持つ公正学習と、他の関数クラスには指数的ギャップが存在することを示す。

実験結果

リサーチクエスチョン

RQ1逐次的なコンテキストバンディットにおいて、公平性制約をどのように形式化できるか。
RQ2古典的な確率的バンディットで公平性を強制する学習コスト（後悔において）はどのようなものか。
RQ3コンテキストバンディット設定における公平性と KWIK 学習との関係はどうなるか。
RQ4線形の場合で公正なコンテキストバンディットは効率的に学習できるか、次元依存はどうなるか。
RQ5公平性が非公正学習と比較して指数的なペナルティを課す問題設定は存在するか。

主な発見

FairBandits は delta-fair であり、低い平均を持つアームが高い平均を持つアームよりも高い確率で優遇されないことを保証する信頼区間を維持する。
古典的（非コンテキスト）バンディット設定では、FairBandits は T に対して O(k^3) の後悔を達成し、対数因子を除けば k に関するこの依存性は定数まで tight である。
公正学習と非公正学習の間には基本的な分離が存在し、いくつかのインスタンスでは公正アルゴリズムが一様探索を脱するのに Omega(k^3) 時間を要する。
KWIK 学習フレームワークは、KWIK の界と後悔の間の縮約を介して、文脈設定における最適な公正学習レートを厳密に特徴づける。
線形コンテキストバンディットに対して、次元 d に多項式的依存性を持つ厳密に公正なアルゴリズムを提供する。
関数クラス（例えばブーリアン連結）には、公正学習が次元 d に対して指数的な下界を有するものがあり、フェアネスの潜在的な最悪ケースのペナルティを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。