QUICK REVIEW

[論文レビュー] Evaluating Model Performance Under Worst-case Subpopulations

Mike Li, Mittal, Daksh|arXiv (Cornell University)|Jul 1, 2024

Bayesian Modeling and Causal Inference被引用数 6

ひとこと要約

本論文は、与えられたサイズのすべてのサブポピュレーションに対する最悪ケースの性能を定義・推定し、有限サンプル保証を持つデバイアスなしでスケーラブルな評価手法を提供する。

ABSTRACT

The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.

研究の動機と目的

任意の Z で定義されたサブポピュレーションに跨る分布シフトの下で、MLモデルの頑健な評価を動機づける。
デプロイ安全性のための最悪ケースのサブポピュレーション性能 W_alpha* とその証明書 alpha* を定義する。
条件付きリスクと尾部リスク目的を近似する、スケーラブルな二段階推定手順を開発する。
Z に対して次元に依存しない有限サンプルおよび漸近保証を提供し、mu 推定に深層ネットワークの利用を可能にする。

提案手法

最悪ケースのサブポピュレーション性能 W_alpha* を、条件付きリスク mu*(Z) の尾部平均（CVaR）として定式化する。
W_alpha* をスカラー η と正の項の最小化として表現する双対再定式化を適用する; mu*(Z) の (1-α) 分位点と関連付ける。
補助データ上でモデルクラス H に対して回帰型問題を解くことにより条件付きリスク mu*(Z) を推定する。
最終的な W_alpha* の計算に対して mu*(Z) の推定で生じる1次誤差を補正するデバイアスされた（拡張された）推定量を用いる。
複数の折りを組み合わせるためにクロスフィットを用い、頑健でデータ効率の良い推定量と中心極限定理を得る。
頑健性証明書 alpha* とその信頼区間を推定する手順を提供する。

実験結果

リサーチクエスチョン

RQ1任意のサブポピュレーション定義 Z に対して、最悪ケースのサブポピュレーション性能をいかに定量化・認証するか？
RQ2スケーラブルでデバイアス済みの tail-risk 目的 W_alpha* の推定量を構築し、良好な有限サンプル保証を達成できるか？
RQ3収束速さは何で、条件付きリスクモデルクラス H の複雑さにどのように依存するか？
RQ4受け入れ可能な性能を示す最小サブポピュレーションサイズを示す閾値 alpha* を通じて頑健性を認証するにはどうすればよいか？

主な発見

デバイアス済みの二段階推定量は、最悪ケースのサブポピュレーション性能について O_p(sqrt(Comp_n(H)/n)) の収束速度を達成する。
この方法は次元に依存しない濃度境界を許容し、その境界は mu*(Z) の推定のアウト・オブ・サンプル誤差に依存する。
中心極限定理は mû の収束が遅くてもデバイアスされた推定量で sqrt(n) 速率を示す。
このアプローチは最悪ケースのサブポピュレーション性能を条件付きバリュー・アット・リスクおよび整合性あるリスク測度と結びつける。
本手法は実用的な頑健性証明書 alpha* をサポートし、mu 推定に深層ネットワークを用いたモデル評価を可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。