QUICK REVIEW

[論文レビュー] A Study on Feature Selection Techniques in Educational Data Mining

M. Ramaswami, R. Bhaskaran|ArXiv.org|Dec 19, 2009

Machine Learning and Data Classification参考文献 10被引用数 123

ひとこと要約

本研究では、教育的データマイニングにおける学生の学力予測要因の最適なサブセットを特定するために、6つのフィルタード特徴選択手法を評価する。ナイーブベイズをベースライン分類器として用い、特徴次元の低減が予測精度、F-スコア、ROC値を向上させるとともに計算コストを削減することを示している。最も優れた手法は、複数の分類器を用いた比較ベンチマークによって同定された。

ABSTRACT

Educational data mining (EDM) is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the behaviors of students in the learning process. In this EDM, feature selection is to be made for the generation of subset of candidate variables. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting various filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of six filtered feature selection algorithms in terms of F-measure value and Receiver Operating Characteristics (ROC) value, generated by the NaiveBayes algorithm as base-line classifier method. The comparative study carried out by us on six filter feature section algorithms reveals the best method, as well as optimal dimensionality of the feature subset. Benchmarking of filter feature selection method is subsequently carried out by deploying different classifier models. The result of the present study effectively supports the well known fact of increase in the predictive accuracy with the existence of minimum number of features. The expected outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

研究の動機と目的

学生の学力予測に最小限の基数で最も関連性の高い特徴サブセットを特定すること。
6つのフィルタード特徴選択アルゴリズムの効果性がモデル性能に与える影響を評価すること。
特徴サブセットの基数がF-スコアおよびROC値に与える影響を評価すること。
複数の分類器モデルを用いたベンチマークを通じて、最も優れた特徴選択手法を特定すること。
最適な特徴選択によって、学生の学力モデリングにおける計算時間と訓練コストを低減すること。

提案手法

統計的指標に基づいて、教育的データセットから関連する特徴を抽出するためにフィルタード特徴選択手法が適用された。
高品質な特徴サブセットの選択能力を評価するために、6つの特定のフィルタードアルゴリズムが検証された。
各選択された特徴サブセットのF-スコアおよびROC値を計算するために、ナイーブベイズがベースライン分類器として使用された。
分類品質を評価するために、F-スコアおよび受信者操作特性曲線下の面積（AUC）を用いて性能が測定された。
最適な次元性を特定するために、異なる基数の特徴サブセットがテストされた。
最も優れた特徴選択手法は、ベンチマークの目的で複数の分類器モデルを用いてさらに検証された。

実験結果

リサーチクエスチョン

RQ1どのフィルタード特徴選択手法が学生の学力予測モデルにおいて最高の予測精度を達成するか？
RQ2選択された特徴サブセットの基数は、F-スコアおよびROC値にどのように影響するか？
RQ3モデル性能を最大化する最適な特徴サブセットの次元は何か？
RQ4特徴選択は、学習および分類フェーズにおける計算コストをどのように低減するか？
RQ5どの特徴選択手法が複数の分類器モデルにおいて一貫して優れた性能を示すか？

主な発見

本研究では、F-スコアおよびROC性能指標に基づいて、最も効果的なフィルタード特徴選択手法が同定された。
特徴次元の低減が、予測精度の向上をもたらし、最小限かつ関連性の高い特徴集合の利点を裏付けた。
最適な特徴サブセットサイズは、モデル性能を向上させるとともに、計算オーバーヘッドを最小限に抑えることが確認された。
最も優れた特徴選択手法は、複数の分類器モデルにおいて一貫した優位性を示した。
特徴サブセット最適化のおかげで、計算時間およびモデル構築コストが顕著に削減された。
結果は、教育的データマイニングにおいて、少ないが高品質な特徴がより優れた予測モデルをもたらすという既知の原則を支持している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。