QUICK REVIEW

[論文レビュー] A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods

Israa Ahmed Zriqat, Ahmad Mousa Altamimi|arXiv (Cornell University)|Apr 10, 2017

Artificial Intelligence in Healthcare参考文献 19被引用数 49

ひとこと要約

本研究では、MATLABを用いた大規模なデータセットを用いて、心疾患予測のための5つのデータマイニング分類アルゴリズム—ナイーブベイズ、意思決定木、判別分析、ランダムフォレスト、サポートベクターマシン—を評価した。意思決定木が99.0%の最高精度を達成し、ランダムフォレストというアンサンブルバージョンでさえもそれを上回った。これは、この特定のデータセットにおいて、個々の木モデルがアンサンブル手法よりも効果的である可能性を示唆している。

ABSTRACT

Improving the precision of heart diseases detection has been investigated by many researchers in the literature. Such improvement induced by the overwhelming health care expenditures and erroneous diagnosis. As a result, various methodologies have been proposed to analyze the disease factors aiming to decrease the physicians practice variation and reduce medical costs and errors. In this paper, our main motivation is to develop an effective intelligent medical decision support system based on data mining techniques. In this context, five data mining classifying algorithms, with large datasets, have been utilized to assess and analyze the risk factors statistically related to heart diseases in order to compare the performance of the implemented classifiers (e.g., Naïve Bayes, Decision Tree, Discriminant, Random Forest, and Support Vector Machine). To underscore the practical viability of our approach, the selected classifiers have been implemented using MATLAB tool with two datasets. Results of the conducted experiments showed that all classification algorithms are predictive and can give relatively correct answer. However, the decision tree outperforms other classifiers with an accuracy rate of 99.0% followed by Random forest. That is the case because both of them have relatively same mechanism but the Random forest can build ensemble of decision tree. Although ensemble learning has been proved to produce superior results, but in our case the decision tree has outperformed its ensemble version.

研究の動機と目的

データマイニング技術を用いたインテリジェントな医療意思決定支援システムの開発を目的とする。心疾患の予測を向上させる。
大規模な心疾患データセット上で、5つの分類アルゴリズムの性能を評価・比較すること。
早期心疾患発見に最も正確で信頼性の高い分類器を特定すること。
データ駆動型のリスク要因分析を通じて、診断ミスと医療コストを削減すること。
機械学習を用いた臨床意思決定支援のための実用的で高精度なソリューションを提供すること。

提案手法

5つのデータマイニング分類アルゴリズムを実装した：ナイーブベイズ、意思決定木、判別分析、ランダムフォレスト、サポートベクターマシン。
実世界の2つの心疾患データセットを用いて、MATLAB環境で実験を実施した。
分類器は標準化されたデータ上で訓練およびテストされ、主に正答率を指標として性能を評価した。
リスク要因の同定のために、特徴選択と統計解析が適用された。
ランダムフォレストでは、一般化性能の向上を目的として複数の意思決定木を組み合わせるアンサンブル学習が用いられた。
性能比較は正答率に基づき、全5つのアルゴリズムにわたって結果が分析された。

実験結果

リサーチクエスチョン

RQ1どのデータマイニング分類アルゴリズムが心疾患予測において最も高い正確性を達成するか？
RQ2この文脈において、個々の木ベースのモデルはランダムフォレストのようなアンサンブル手法と比べてどのように異なるか？
RQ3どの程度までデータマイニング技術が診断ミスを低減し、臨床意思決定を支援できるか？
RQ4分類器によって同定された、心疾患と最も関連性の高い統計的有意リスク要因は何か？
RQ5大規模なデータセットの使用が、異なるアルゴリズムの予測性能を顕著に向上させるか？

主な発見

意思決定木分類器が、心疾患予測において99.0%の最高精度を達成した。
ランダムフォレストは意思決定木のアンサンブルであるにもかかわらず、個々の意思決定木モデルよりもわずかに性能が低かった。
全5つの分類器が強力な予測性能を示し、正答率が95%以上であった。
本研究は、データマイニング技術が診断の正確性を顕著に向上させ、医療ミスを低減できることを確認した。
本データセットにおいては、単一の意思決定木の単純さと解釈可能性が、アンサンブル学習の利点を上回る可能性がある。
判別分析とナイーブベイズは中程度の性能を示し、意思決定木およびランダムフォレストの下位に位置した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。