Git Product home page Git Product logo

ml-for-nids's Introduction

Exploring Explainable AI techniques for Detecting Contamination Features for improved Machine Learning-based Intrusion Detection

This repository contains the code and resources for my master's thesis on machine learning for network intrusion detection at Ghent University. The goal of this thesis is to develop a methodology that detects contaminating features in intrusion detection system (IDS) datasets using explainable artificial intelligence (XAI) techniques such as SHAP. Additionally, the inter-dataset generalization technique is employed to assess the impact of these features on the generalization ability of machine learning (ML) models.

Repository Structure

The repository is organized as follows:

  • contaminant_discovery: Contains the code for the training and analysis phase of our methodology to detect contaminants.
    • *-heatmaps: heatmaps generated in every training/analysis cycle are available here.
  • contaminant_validation_phase: Includes the code for the validation phase of the methodology.
    • boxplots: Contains boxplots for comparing feature distribution.
  • generalization: Contains bash scripts and Python scripts to run the inter-dataset generalization testing.
    • results: contains the results of the generalization experiment in table form and in visual form (heatmaps)
      • generalization_heatmap: Stores the heatmaps generated by running the inter-dataset generalization testing code.
      • bar_chart: Contains the grouped bar plot of the results.
  • ks-test: Contains the code for the KS-test and the analysis of the KS-test results.
    • ks_test_scatter: Includes scatterplots of the results of the KS-test code.
    • ks_test_violin: Contains violin plots of the results of the KS-test code.

Getting Started

The following packages are needed to run the code in this repository:

  • numpy
  • pandas
  • fastai
  • seaborn
  • sklearn
  • scipy
  • shap
  • matplotlib
  • plotly.express

Source Datasets

The datasets used were part of the NFV2-collection by the university of Queensland aimed at standardizing network-security datasets to achieve interoperability and larger analyses. The cleaned versions of these datasets were used and are available in Kaggle.

The UNSW-NB15 dataset with metadata used for measuring the effectiveness of our methodology is also available in Kaggle

Acknowledgements

I would like to express my deepest gratitude to all people who helped me with guidance and advice for finishing this thesis.

First and foremost my supervisors Laurens D’hooge and Miel Verkerken for their invaluable guidance and support throughout the duration of this research. Their research and discoveries have played a pivotal role in shaping the direction and quality of this thesis. Also, their persistence in encouraging me to take full ownership of the research and pursue my own ideas made this research into something I can really call my own.

I am also thankful to the members of my thesis committee, prof. dr. Bruno Volckaert, dr. ir. Tim Wauters and Prof. dr. ir. Filip De Turck, for providing me with the opportunity to conduct research in this field within the research group.

I would like to extend my appreciation to imec for providing the necessary computing resources that enabled the successful completion of this research.

Also a thanks to the NYCU for their willingness to collaborate. I am particularly grateful to Didik Sudyana and Fietyata Yudha, despite the fact that we were unable to pursue the collaboration I had initially envisioned due to challenges with data quality and time limitations. Nonetheless, both of them displayed a readiness to assist me and provided valuable explanations regarding the CREMEv2 data.

A special thanks to all lecturers from the Information Engineering Technology Department at Ghent University for the high�quality education and these 4 interesting years. Without them, I would not be where I am right now. Their expertise, passion, and dedication have been instrumental in shaping my understanding of the subject matter and laying a strong foundation for my research.

My heartfelt thanks go to my family for giving me this opportunity to study and get this degree.

Additionally, I should not forget my friends, who have made the past four years simply fly by. It is through their companion�ship that I have created unforgettable memories throughout my journey that will forever be cherished.

Lastly, I express my gratitude to all the individuals who, directly or indirectly, have contributed to the completion of this thesis. Their contributions may not be explicitly mentioned, but their impact has been significant and deeply appreciated.

Thank you for your interest in my research project! 😊

ml-for-nids's People

Contributors

rtalwar2 avatar

Stargazers

 avatar Bruno Volckaert avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.