QUICK REVIEW

[論文レビュー] Predictive Inference Is Free with the Jackknife+-after-Bootstrap

Byol Kim, Xu Chen|arXiv (Cornell University)|Feb 20, 2020

Machine Learning and Data Classification参考文献 37被引用数 32

ひとこと要約

ジャックナイフ+-アフターブートストラップ（J+aB）を導入。エンサンブル予測器の計算効率の高いラッパーで、分布に依存しない予測区間を保証しつつ、カバレージを少なくとも1−2αに保ち、コストは単一のエンサンブル予測に近い程度に抑える。

ABSTRACT

Ensemble learning is widely used in applications to make predictions in complex decision problems---for example, averaging models fitted to a sequence of samples bootstrapped from the available training data. While such methods offer more accurate, stable, and robust predictions and model estimates, much less is known about how to perform valid, assumption-lean inference on the output of these types of procedures. In this paper, we propose the jackknife+-after-bootstrap (J+aB), a procedure for constructing a predictive interval, which uses only the available bootstrapped samples and their corresponding fitted models, and is therefore "free" in terms of the cost of model fitting. The J+aB offers a predictive coverage guarantee that holds with no assumptions on the distribution of the data, the nature of the fitted model, or the way in which the ensemble of models are aggregated---at worst, the failure rate of the predictive interval is inflated by a factor of 2. Our numerical experiments verify the coverage and accuracy of the resulting predictive intervals on real data.

研究の動機と目的

アンサンブル学習出力に対して妥当で前提に依存しない予測推論を動機付ける。
分布仮定を必要としない予測区間でエンサンブル手法をラップする。
アウト・オブ・バッグ観測を通じて基盤モデルフィットを再利用し計算効率を維持する。
有限サンプルのカバレッジ保証と実データでの実証を提供する。

提案手法

ジャックナイフ+-アフターブートストラップ（J+aB）を任意の基本回帰アルゴリズムと集計関数のラッパーとして提案する。
追加のベースモデル呼び出しなしにアウト・オブ・バッグモデルを再利用してリーブワンアウト予測を得る（アルゴリズム2）。
結合分位点ベースのスキームを用いて予測区間を計算する：各xで C_alpha,n,B(x) = [q_{alpha,n}^{-} { mu_phi\ i (x) - R_i }, q_{alpha,n}^{+} { mu_phi\ i (x) + R_i }], ここで R_i = |Y_i - mu_phi\ i (X_i)|。
計算コストを O(B) のベースモデル呼び出しに保ち、単一のエンサンブル予測のコストに匹敵させる。
交換可能性を回復し分布自由な保証を可能にする対称性補正版を、Binomial(B) ドローで提供。
理論的保証：IIDデータと対称性仮定の下で、P(Y_{n+1} ∈ C_alpha,n,B^{J+aB}(X_{n+1})) ≥ 1−2α（有限サンプル、非漸近的）。

実験結果

リサーチクエスチョン

RQ1エンセmbles予測器のラッパーは、強いモデリング仮定なしに有効で分布自由な予測区間を提供できるか。
RQ2J+aB の有限サンプルにおけるカバレッジ保証とその厳密性はどの程度か。
RQ3単一のエンサンブルパスと同程度の計算効率を維持しつつ、利用者に優しい予測区間を提供できるか。
RQ4異なる基本学習器を用いた実データセットで J+aB は実務的にどの程度機能するか。

主な発見

エンサンブル	R への呼び出し回数	評価回数	φ への呼び出し回数
エンサンブル	B	B n_test	n_test
J+ with Ensemble	B n	Bn(1+n_test)	n(1+n_test)
J+aB	B	B(n+n_test)	n(1+n_test)

J+aB 区間は実データ3件の実験で名目レベル 1−α に近いカバレッジを達成。
理論的保証は、最悪の場合でも分布自由な下限 1−2α を非漸近的に示し、任意の n と分布に対して有効である。
J+aB の計算コストは単一のエンサンブル予測を生み出す程度と同等であり、追加のモデルフィットは実質的に“無料”。
経験的結果は、安定しないベース学習器（例：Random Forest）に対しても J+aB 区間が有意で、狭くなる傾向があることを示唆。
J+aB は代替案 J+ensemble とコスト面で比較して有利であり、カバレッジと区間品質も同等レベルを提供。
この手法は、残差、分位点回帰、重み付き残差などのさまざまな適合度測度と異なる集計と組み合わせるのに適している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。