QUICK REVIEW

[論文レビュー] A Fast SVM-based Feature Selection Method, Combining MFE (Margin-Maximizing Feature Elimination) and Upper Bound on Misclassification Risk

Yaman Aksu|arXiv (Cornell University)|Oct 16, 2012

Imbalanced Data Classification Techniques被引用数 2

ひとこと要約

本稿では、汎化誤差を低減するためにマージン最大化特徴選択（MFE）とデータラジウスの活用を組み合わせた高速なSVMベースの特徴選択手法を提案する。新しいソフトマージン軽量再訓練手法（QP1）とラジウスに配慮した削除基準を導入することで、MFE-LOを上回り、高次元・少数サンプルデータセットでより低いテスト誤差率を達成する。

ABSTRACT

Margin maximization in the hard-margin sense, proposed as feature elimination criterion by the MFE-LO method, is combined here with data radius utilization to further aim to lower generalization error, as several published bounds and bound-related formulations pertaining to lowering misclassification risk (or error) pertain to radius e.g. product of squared radius and weight vector squared norm. Additionally, we propose additional novel feature elimination criteria that, while instead being in the soft-margin sense, too can utilize data radius, utilizing previously published bound-related formulations for approaching radius for the soft-margin sense, whereby e.g. a focus was on the principle stated therein as finding a bound whose minima are in a region with small leave-one-out values may be more important than its tightness. These additional criteria we propose combine radius utilization with a novel and computationally low-cost soft-margin light classifier retraining approach we devise named QP1; QP1 is the soft-margin alternative to the hard-margin LO. We correct an error in the MFE-LO description, find MFE-LO achieves the highest generalization accuracy among the previously published margin-based feature elimination (MFE) methods, discuss some limitations of MFE-LO, and find our novel methods herein outperform MFE-LO, attain lower test set classification error rate. On several datasets that each both have a large number of features and fall into the `large features few samples' dataset category, and on datasets with lower (low-to-intermediate) number of features, our novel methods give promising results. Especially, among our methods the tunable ones, that do not employ (the non-tunable) LO approach, can be tuned more aggressively in the future than herein, to aim to demonstrate for them even higher performance than herein.

研究の動機と目的

高次元・少数サンプルデータセットにおける特徴選択の汎化性能を向上させ、誤分類リスクを低減すること。
特にMFE-LOに限界を示す既存のマージンベース特徴選択手法の課題を、選択基準にデータラジウスを組み込むことで解決すること。
ハードマージンLO再訓練手法の計算コストの高い代替手段として、計算効率の良いソフトマージン代替手法QP1を提案すること。
ソフトマージン設定下で、境界関連の定式化とデータラジウスを活用した新たな特徴削除基準を提案すること。
従来のMFE手法と比較して、特に特徴数が多くサンプル数が少ないデータセットにおいて、優れた分類誤差低減を実証すること。

提案手法

ハードマージンの観点からマージン最大化に基づく新しい特徴削除基準を導入し、一般化誤差を低減するためにデータラジウスを活用する。
境界関連の定式化にデータラジウスを統合し、タイトさの最適化ではなくLO誤差の最小化に焦点を当てた、新しいソフトマージン特徴選択アプローチを提案する。
ソフトマージンSVM向けの軽量かつ計算効率の良い再訓練手法QP1を開発し、より高価なLO再訓練の代替手段とする。
MFE-LO手法の記述に以前報告された誤りを是正し、再評価することで、MFE-LOがこれまでに発表された中で最高の性能を示すことが確認された。
ラジウスに配慮した基準とQP1を組み合わせ、複数のベンチマークデータセットでMFE-LOを上回るチューナブルな特徴選択手法を構築する。
LO誤差が低い領域を優先する境界関連の定式化を採用し、一般化誤差の最小化という目的と整合する。

実験結果

リサーチクエスチョン

RQ1マージンベース特徴選択にデータラジウスを組み込むことで、従来のMFE手法よりも一般化誤差をより効果的に低減できるか。
RQ2提案されたソフトマージン・ラジウスに配慮した特徴選択手法の性能は、MFE-LOと比較してテスト誤差率においてどのように差がつくか。
RQ3QP1再訓練手法は、ソフトマージンSVMにおける分類精度を維持しつつ、計算効率をどの程度向上できるか。
RQ4提案手法は、高次元かつサンプル数が少ないデータセットにおいて、MFE-LOを上回る性能を達成できるか。
RQ5提案手法はどの程度チューニング可能であり、より積極的なチューニングによって報告された結果をさらに上回る性能が得られるか。

主な発見

是正されたMFE-LO手法は、これまでに発表されたマージンベース特徴選択手法の中で最高の汎化精度を達成した。
提案された新しい手法はMFE-LOを上回り、複数のデータセットでより低いテストセット分類誤差率を達成した。
特徴数が多くサンプル数が少ないデータセットでは、提案手法が有望で一貫性のある性能向上を示した。
非チューナブルなLOアプローチに依存しない、チューナブルな提案手法のバージョンは、より積極的なチューニングによりさらに高い性能を示す可能性を秘めている。
ソフトマージン特徴選択基準にデータラジウスを統合することで、特にLO誤差最小化に従ってガイドされた場合、一般化性能が向上した。
QP1により、ソフトマージン再訓練が効率的に行えるようになり、高次元データセットに対しても提案手法の計算的実行可能性が確保された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。