QUICK REVIEW

[論文レビュー] Dengue disease prediction using weka data mining tool

Kashish Ara Shakil, Shadma Anis|arXiv (Cornell University)|Feb 18, 2015

Artificial Intelligence in Healthcare参考文献 11被引用数 44

ひとこと要約

本研究では、108件のインスタンスからなるデータセットを99件に削減した18の属性を用いて、WEKAにおける複数のデータマイニングアルゴリズムを用いたデングエ病の予測を評価した。ナイーブベイズとJ48は、分類精度100％、AUCが1、平均絶対誤差が最小で、モデルの学習時間が最も速く、本研究においてデングエ予測の最も効果的なアルゴリズムであることが示された。

ABSTRACT

Dengue is a life threatening disease prevalent in several developed as well as developing countries like India.In this paper we discuss various algorithm approaches of data mining that have been utilized for dengue disease prediction. Data mining is a well known technique used by health organizations for classification of diseases such as dengue, diabetes and cancer in bioinformatics research. In the proposed approach we have used WEKA with 10 cross validation to evaluate data and compare results. Weka has an extensive collection of different machine learning and data mining algorithms. In this paper we have firstly classified the dengue data set and then compared the different data mining techniques in weka through Explorer, knowledge flow and Experimenter interfaces. Furthermore in order to validate our approach we have used a dengue dataset with 108 instances but weka used 99 rows and 18 attributes to determine the prediction of disease and their accuracy using classifications of different algorithms to find out the best performance. The main objective of this paper is to classify data and assist the users in extracting useful information from data and easily identify a suitable algorithm for accurate predictive model from it. From the findings of this paper it can be concluded that Naïve Bayes and J48 are the best performance algorithms for classified accuracy because they achieved maximum accuracy= 100% with 99 correctly classified instances, maximum ROC = 1, had least mean absolute error and it took minimum time for building this model through Explorer and Knowledge flow results

研究の動機と目的

実世界の健康データを用いて、デングエ病を予測するための最も正確なデータマイニングアルゴリズムを特定すること。
WEKAにおけるさまざまな機械学習アルゴリズムの性能を評価・比較すること。
精度、AUC、平均絶対誤差、および学習時間を基準に最適なアルゴリズムを特定すること。
保健機関がデングエの発生を予測するための効果的な予測モデルを選択するための実用的フレームワークを提供すること。
WEKAのエクスプローラ、ナレッジフロー、エクスペリメンターインターフェースがモデル選択と評価において有効であることを検証すること。

提案手法

本研究では、108件のインスタンスを含むデングエデータセットを、分析用に99件の行と18の属性に削減した。
全アルゴリズムの性能評価に10分割交差検証を適用した。
WEKAのエクスプローラ、ナレッジフロー、エクスペリメンターインターフェースを用いて、12種類の異なる分類アルゴリズムの学習と比較を行った。
性能指標には分類精度、受信者操作特性曲線下の面積（AUC）、平均絶対誤差、およびモデルの学習時間を含めた。
最高の精度、AUC、および最小の誤差と時間に基づいて、最良の性能を示したアルゴリズムを選定した。
ナイーブベイズとJ48は、すべてのインターフェースを通じた比較評価により、トップパフォーマーとして特定された。

実験結果

リサーチクエスチョン

RQ1WEKAにおけるどのデータマイニングアルゴリズムがデングエ予測で最も高い分類精度を達成するか？
RQ2異なるアルゴリズムは、AUC、平均絶対誤差、および学習時間の観点でどのように比較されるか？
RQ3WEKAのエクスプローラ、ナレッジフロー、エクスペリメンターインターフェースは、デングエ予測のためのモデル選択を効果的に支援できるか？
RQ4精度、計算効率、およびモデルの信頼性の間で、最適なバランスは何か？
RQ510分割交差検証と複数のWEKAインターフェースの組み合わせは、モデル評価の堅牢性を向上させるか？

主な発見

ナイーブベイズとJ48は、99件のインスタンスすべてを正しく分類し、100％の分類精度を達成した。
両アルゴリズムとも、最大の受信者操作特性曲線下の面積（AUC）1.0を達成し、完全な識別能力を示した。
これらの2つのアルゴリズムは、評価された全モデルの中で最も低い平均絶対誤差を記録した。
ナイーブベイズとJ48は、最も短い学習時間を要し、モデル構築において最も効率的であった。
WEKAのエクスプローラおよびナレッジフローインターフェースからの結果は一貫しており、ナイーブベイズとJ48の優れた性能を裏付けた。
本研究は、与えられたデータセットを用いた正確で効率的なデングエ予測に、ナイーブベイズとJ48が最も適していることを確認した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。