Skip to main content
QUICK REVIEW

[Paper Review] IoT Security: Botnet detection in IoT using Machine learning

Satish Pokhrel, Hassan Abbas|arXiv (Cornell University)|Apr 6, 2021
Network Security and Intrusion Detection21 references81 citations
TL;DR

The paper proposes a botnet-DDoS detection model for IoT using supervised ML (KNN, Naive Bayes, MLP-ANN) with feature engineering and SMOTE on BoT-IoT data; KNN yields the best performance, and imbalanced data requires SMOTE and cross-validation for reliable evaluation.

ABSTRACT

The acceptance of Internet of Things (IoT) applications and services has seen an enormous rise of interest in IoT. Organizations have begun to create various IoT based gadgets ranging from small personal devices such as a smart watch to a whole network of smart grid, smart mining, smart manufacturing, and autonomous driver-less vehicles. The overwhelming amount and ubiquitous presence have attracted potential hackers for cyber-attacks and data theft. Security is considered as one of the prominent challenges in IoT. The key scope of this research work is to propose an innovative model using machine learning algorithm to detect and mitigate botnet-based distributed denial of service (DDoS) attack in IoT network. Our proposed model tackles the security issue concerning the threats from bots. Different machine learning algorithms such as K- Nearest Neighbour (KNN), Naive Bayes model and Multi-layer Perception Artificial Neural Network (MLP ANN) were used to develop a model where data are trained by BoT-IoT dataset. The best algorithm was selected by a reference point based on accuracy percentage and area under the receiver operating characteristics curve (ROC AUC) score. Feature engineering and Synthetic minority oversampling technique (SMOTE) were combined with machine learning algorithms (MLAs). Performance comparison of three algorithms used was done in class imbalance dataset and on the class balanced dataset.

Motivation & Objective

  • Motivate IoT security by addressing botnet-based DDoS threats in IoT networks.
  • Develop a machine learning-based detector trained on BoT-IoT traffic data.
  • Mitigate class-imbalance issues using SMOTE and feature engineering.
  • Evaluate and compare multiple supervised ML algorithms on real IoT botnet data.

Proposed method

  • Use BoT-IoT dataset comprising botnet and normal IoT traffic for model training and evaluation.
  • Perform data cleansing, normalization, and transformation to numeric features.
  • Apply feature engineering via chi-square (F-score) to select top features (eight features).
  • Balance the dataset with SMOTE to create a class-balanced set.
  • Train and evaluate Gaussian Naive Bayes, KNN, and MLP-ANN classifiers with 80/20 train/test split and 5-fold cross-validation.
  • Assess performance using accuracy, precision, recall, F1-score, and ROC AUC, emphasizing ROC AUC due to severe class imbalance.

Experimental results

Research questions

  • RQ1Which supervised ML algorithm (Gaussian NB, KNN, MLP-ANN) provides the best botnet detection performance on BoT-IoT data?
  • RQ2What is the impact of class imbalance on model performance and how does SMOTE balance influence results?
  • RQ3Which features (top eight by chi-square) most effectively discriminate botnet from normal IoT traffic?
  • RQ4How do cross-validation results compare to simple train/test splits in terms of model reliability on unseen data.

Key findings

  • On the real (imbalanced) BoT-IoT dataset, Gaussian NB achieved ~100% accuracy but ROC AUC ~0.51 and low recall/F1, indicating poor discrimination in imbalanced data.
  • KNN achieved high performance on both datasets, with accuracy 99.6% and ROC AUC 99.2% on the imbalanced data, and 92.1% accuracy and 92.2% ROC AUC on the SMOTE-balanced data.
  • MLP-ANN showed 87.4% accuracy with relatively low precision/recall/F1/ROC AUC, indicating it underperformed compared to KNN in this task.
  • SMOTE balanced the dataset to 1,989,656 samples (equal botnet and normal traffic), enabling more reliable evaluation of model performance.
  • Eight top features (bytes, sbytes, dbytes, rate, pkts, spkts, srate, drate) were identified as most discriminative via chi-square feature scoring.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.