Git Product home page Git Product logo

western-oc2-lab / automl-implementation-for-static-and-dynamic-data-analytics Goto Github PK

View Code? Open in Web Editor NEW
614.0 68.0 106.0 5.25 MB

Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning

License: MIT License

Jupyter Notebook 100.00%
automated-machine-learning automl concept-drift data-preprocessing data-stream-processing data-streams deep-learning feature-engineering hyperparameter-tuning intrusion-detection-system

automl-implementation-for-static-and-dynamic-data-analytics's Introduction

AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics

This code provides an Automated Machine Learning (AutoML) implementation for static and dynamic data analytics problems. It provides a case study of IoT anomaly detection using many ML algorithms and optimization/AutoML methods (for automating and optimizing ML algorithms). It involves the automation of all important procedures in the machine learning/data analytics pipeline, including automated data pre-processing, automated feature engineering, automated model selection, Hyper-Parameter Optimization (HPO), and automated model updating (model drift adaptation). It can also be used as a tutorial to help machine learning researchers to automatically obtain optimized machine learning models with the optimal learning performance on any specific task.

  • Batch/Static Learning: Batch learning is the traditional machine learning and data analytics process. Batch learning methods analyze static IoT data in batches and often need access to the entire dataset prior to model training.
  • Online/Continual learning: Online learning or continual learning techniques are able to train models using continuously incoming online data streams in dynamic IoT environments and address concept drift issues (data distribution changes).

This code is also the implementation of a review paper published in Engineering Applications of Artificial Intelligence (IF: 7.8):
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.

This paper and code will help industrial users, data analysts, and researchers to better develop machine learning models using automation technology.

  • A comprehensive hyperparameter optimization (automatically tuning the hyperparameters of machine learning algorithms to achieve optimal performance) tutorial code can be found in: Hyperparameter-Optimization-of-Machine-Learning-Algorithms
    • 1,200+ GitHub stars
    • 1,500+ citations by journal & conference papers

Paper Link

IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
One-column version: arXiv
Two-column version: Elsevier

AutoML Pipeline and Procedures

  1. Automated Data Pre-Processing
  2. Automated Feature Engineering
  3. Automated Model Selection
  4. Hyper-Parameter Optimization
  5. Automated Model Updating (for addressing concept drift, and only for online learning and data stream analytics)

Quick Navigation of The Paper

Section 3: IoT data analytics overview
Section 3: Model learning (introduce all common machine learning algorithms)
Section 4: AutoML overview & optimization techniques (introduce what is AutoML and its techniques)
Section 5: Automated data pre-processing
Section 6: Automated feature engineering
Section 7: Automated model updating by handling concept drift
Section 8: Selection of evaluation metrics and validation methods
Section 9: AutoML Tools and libraries
Section 10: Case study (Experimental results, sample code in "AutoML_Batch_Learning_CIC.ipynb")
Section 11: Open challenges and future research directions
Summary table for Sections 3: Table 1 & 2: A comprehensive overview of common ML models, their hyperparameters, their advantages and limitations, and suitable IoT tasks
Summary table for Sections 4: Table 3: The comparison of common optimization methods for CASH and HPO problems
Summary table for Sections 7: Table 5: The comparison of concept drift methods for automated model updating
Summary table for Sections 10: Table 6: The specifications of the proposed AutoML pipeline
Summary table for Sections 11: Table 12: The challenges and research directions of applying AutoML to IoT data analytics

Implementation

Static Machine Learning & Deep Learning Algorithms

  • Random forest (RF)
  • LightGBM
  • K-nearest neighbor (KNN)
  • Naive Bayes (NB)
  • Artificial Neural Networks (ANN)

Dynamic/Online Learning Algorithms

  • Hoeffding Tree (HT)
  • Leveraging Bagging (LB)
  • Adaptive Random Forest (ARF)
  • Streaming Random Patches (SRP)

Optimization/AutoML Algorithms

  • Grid search
  • Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)
  • Particle Swarm Optimization (PSO)

Datasets

  1. CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems

  2. IoTID20 dataset, a novel IoT botnet dataset

Requirements

Contact-Info

Please feel free to contact me for any questions or cooperation opportunities. I'd be happy to help.

Citation

If you find this repository useful in your research, please cite this article as:

L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.

@article{YANG2022105366,
title = "IoT data analytics in dynamic environments: From an automated machine learning perspective",
author = "Li Yang and Abdallah Shami",
journal = "Engineering Applications of Artificial Intelligence",
volume = {116},
pages = {1-33},
year = "2022",
doi = "https://doi.org/10.1016/j.engappai.2022.105366",
url = "https://www.sciencedirect.com/science/article/pii/S0952197622003803"
}

automl-implementation-for-static-and-dynamic-data-analytics's People

Contributors

liyanghart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.