QUICK REVIEW

[論文レビュー] IoT Security: Botnet detection in IoT using Machine learning

Satish Pokhrel, Hassan Abbas|arXiv (Cornell University)|Apr 6, 2021

Network Security and Intrusion Detection参考文献 21被引用数 81

ひとこと要約

この論文は、BoT-IoTデータを用いた監督付き機械学習（KNN、ナイーブベイズ、MLP-ANN）と特徴量エンジニアリングおよびSMOTEを用いたIoTのボットネット-DDoS検知モデルを提案する。KNNが最良の性能を示し、データの不均衡には信頼性の高い評価のためにはSMOTEとクロスバリデーションが必要である。

ABSTRACT

The acceptance of Internet of Things (IoT) applications and services has seen an enormous rise of interest in IoT. Organizations have begun to create various IoT based gadgets ranging from small personal devices such as a smart watch to a whole network of smart grid, smart mining, smart manufacturing, and autonomous driver-less vehicles. The overwhelming amount and ubiquitous presence have attracted potential hackers for cyber-attacks and data theft. Security is considered as one of the prominent challenges in IoT. The key scope of this research work is to propose an innovative model using machine learning algorithm to detect and mitigate botnet-based distributed denial of service (DDoS) attack in IoT network. Our proposed model tackles the security issue concerning the threats from bots. Different machine learning algorithms such as K- Nearest Neighbour (KNN), Naive Bayes model and Multi-layer Perception Artificial Neural Network (MLP ANN) were used to develop a model where data are trained by BoT-IoT dataset. The best algorithm was selected by a reference point based on accuracy percentage and area under the receiver operating characteristics curve (ROC AUC) score. Feature engineering and Synthetic minority oversampling technique (SMOTE) were combined with machine learning algorithms (MLAs). Performance comparison of three algorithms used was done in class imbalance dataset and on the class balanced dataset.

研究の動機と目的

IoTセキュリティを、IoTネットワークにおけるボットネットベースのDDoS脅威に対処して動機づける。
BoT-IoTトラフィックデータで訓練された機械学習ベースの検出器を開発する。
SMOTEと特徴量エンジニアリングを用いてクラス不均衡の問題を緩和する。
実データのIoTボットネットデータ上で複数の監視付きMLアルゴリズムを評価・比較する。

提案手法

ボットネットと通常の IoT トラフィックを含むBoT-IoTデータセットをモデルの訓練と評価に使用する。
データクレンジング、正規化、数値特徴量への変換を行う。
chi-square (F-score) による特徴量エンジニアリングを適用し、上位特徴量を8個選択する。
データセットをSMOTEでバランス良くし、クラス対称のセットを作成する。
Gaussian Naive Bayes、KNN、MLP-ANNの分類器を80/20の訓練/テスト分割と5-foldクロスバリデーションで訓練・評価する。
精度、適合率、再現率、F1スコア、ROC AUCを用いて性能を評価し、クラス不均衡が深刻なためROC AUCを重視する。

実験結果

リサーチクエスチョン

RQ1BoT-IoTデータ上で最も良いボットネット検出性能を提供する監視付きMLアルゴリズムはどれか（Gaussian NB、KNN、MLP-ANN）？
RQ2クラス不均衡がモデルの性能に及ぼす影響はどのようで、SMOTEのバランス調整は結果にどう影響するか？
RQ3chi-squareによる上位8特徴量のうち、ボットネットと通常のIoTトラフィックを最も効果的に判別する特徴はどれか？
RQ4 unseenデータに対するモデルの信頼性の観点で、クロスバリデーション結果は単純な訓練/テスト分割とどう比較されるか？

主な発見

実データ（不均衡なBoT-IoTデータセット）ではGaussian NBは約100%の精度を達成したがROC AUCは約0.51で再現率/F1が低く、不均衡データにおける識別力が乏しい。
KNNは両方のデータセットで高い性能を示し、不均衡データでの精度99.6%、ROC AUC 99.2%、SMOTEでバランスしたデータでの精度92.1%、ROC AUC 92.2%を記録。
MLP-ANNは87.4%の精度を示したが、適合率/再現率/F1/ROC AUCが比較的低く、このタスクにおいてKNNより劣っていた。
SMOTEはデータセットを1,989,656サンプル（ボットネットと通常トラフィックが等数）にバランスさせ、モデル性能のより信頼性のある評価を可能にした。
8つの上位特徴（bytes, sbytes, dbytes, rate, pkts, spkts, srate, drate）がchi-square特徴スコアリングで最も識別能力が高いと特定された。）

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。