QUICK REVIEW

[論文レビュー] Model Evaluation, Model Selection, and Algorithm Selection in Machine\n Learning

Sebastian Raschka|arXiv (Cornell University)|Nov 13, 2018

Machine Learning and Data Classification被引用数 574

ひとこと要約

モデル評価、モデル選択、アルゴリズム選択技術の総合的レビュー。長所と短所、実用的なガイダンス、小規模データセットと複数比較に対する推奨を含む。

ABSTRACT

The correct use of model evaluation, model selection, and algorithm selection\ntechniques is vital in academic machine learning research as well as in many\nindustrial settings. This article reviews different techniques that can be used\nfor each of these three subtasks and discusses the main advantages and\ndisadvantages of each technique with references to theoretical and empirical\nstudies. Further, recommendations are given to encourage best yet feasible\npractices in research and applications of machine learning. Common methods such\nas the holdout method for model evaluation and selection are covered, which are\nnot recommended when working with small datasets. Different flavors of the\nbootstrap technique are introduced for estimating the uncertainty of\nperformance estimates, as an alternative to confidence intervals via normal\napproximation if bootstrapping is computationally feasible. Common\ncross-validation techniques such as leave-one-out cross-validation and k-fold\ncross-validation are reviewed, the bias-variance trade-off for choosing k is\ndiscussed, and practical tips for the optimal choice of k are given based on\nempirical evidence. Different statistical tests for algorithm comparisons are\npresented, and strategies for dealing with multiple comparisons such as omnibus\ntests and multiple-comparison corrections are discussed. Finally, alternative\nmethods for algorithm selection, such as the combined F-test 5x2\ncross-validation and nested cross-validation, are recommended for comparing\nmachine learning algorithms when datasets are small.\n

研究の動機と目的

汎化性能の評価、モデル選択、アルゴリズム選択の役割と目的を明確にする。
ホールドアウト、クロスバリデーション、ブートストラップなどの一般的な手法を調査し、それらのバイアス、トレードオフ、適用性を整理する。
研究の厳密さと実世界のML実践を改善する実用的な推奨を提供する。

提案手法

ホールドアウト検証とそのバイアス（層化と悲観的バイアスを含む）を説明する。
ブートストラップ法と、それらが性能の不確実性をどのように推定するかを導入する。
k分割を含むクロスバリデーションとハイパーパラメータ調整とモデル選択の区別。
アルゴリズム比較のための統計的検定（例：McNemar、F検定、CochranのQ）と多重検定補正を提示。
アルゴリズム比較の高度な戦略（例：5x2交差検証、ネストされたクロスバリデーション）を説明。
小規模データセットとパリマニーの法則に関する考慮事項を強調。

実験結果

リサーチクエスチョン

RQ1さまざまな評価戦略（ホールドアウト、クロスバリデーション、ブートストラップ）は、汎化性能をどのように推定し、モデル/アルゴリズム選択をどのように支持するか？
RQ2各手法でどのようなバイアスや分散が生じ、層化とサンプルサイズがそれらにどう影響するか？
RQ3データセットサイズ制約の下で複数のアルゴリズムを比較するのに最適な検定および補正手続きはどれか？
RQ4研究と応用におけるモデル評価の信頼性と妥当性を最適化する実践的推奨は何か？

主な発見

ホールドアウト法は楽観的バイアスを招くことがあり、小規模データセットには適さないことがある。層化はクラス分布の問題を緩和する。
ブートストラップと反復再標本化は、性能推定の不確実性を推定する方法を提供する。
k分割交差検証は、kにより支配されるバイアス-分散のトレードオフを含む。kの選択とモデル選択の使用について指針が示されている。
統計的検定（例：McNemar、F検定、CochranのQ）と多重検定補正は、分類器を比較する際に議論されている。
小規模データセットに対しては、5x2交差検証とネストされたクロスバリデーションのような複合手法がアルゴリズム比較に推奨される。
本論は、汎化性能とモデル/アルゴリズム選択の相対性能の違いを強調している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。