Git Product home page Git Product logo

dissertation's Introduction

Creating an intrusion detection system using supervised machine learning

Project overview

This project is a year-long dissertation submission for a BSc Computer Science in Swansea University that models an intrusion detection system using supervised machine learning.

Three distinct models are used for the final comparison: random forest, support vector machine and an artifical neural network. Other models that were trained such as LSTMs were not used for final testing.

This project also performed cyber attacks in an air-gapped, safe environment however the code (or tutorial) is not in this repository.

Overall project result: 94%

Dataset

The dataset used for this project is a collaboration between researchers that produced "a realistic cyber defense dataset". The dataset contains labelled statistical bi-directional flow data of multiple cyber attacks such as denial of service. The original PCAP files are also available.

The dataset: https://registry.opendata.aws/cse-cic-ids2018/

The research paper: https://www.scitepress.org/Papers/2018/66398/66398.pdf

Code structure

The code is split into folders that it is used for.

It should be noted that the import path does not work after it was placed into these folders. Therefore, if you want to run the scripts you need to change the import paths of own files e.g. process_data.

Classifiers

All code here are related to creating a classifier which includes both classical machine learning algorithms such as random forest and neural networks.

classifier.py - this code is purely for random forest and support vector machine. It trains on the data on either basic split or in time series split.

Example code usage:

python classifier.py -f dataset_location -o folder_to_save_data

ensemble.py - code for creating ensemble models. It is used by nn_classifier.py.

nn.py - code for creating neural network models. Supports creating an artificial neural network, using an pretrained model or creating a LSTM. It is used by nn_classifier.py.

nn_classifier.py - code for training neural networks. Trains using stratified cross validation.

Example code usage:

Train ANN: python nn_classifier.py -f dataset_location -o folder_to_save_data

Train singlular ensemble (uses logistic regression): python nn_classifier.py -f dataset_location -o folder_to_save_data -p pretrained_models_location -s

Train intergated ensemble:  python nn_classifier.py -f dataset_location -o folder_to_save_data -p pretrained_models_location -i

predict.py - code for predicting data using trained models

Example code usage:

Predict random forest: python predict.py -d dataset_location -m saved_rf_model_location -c

predict singular ensemble: python predict.py -d dataset_location -p location_of_pretrained_model  -g location_of_ensemble_model -s

Utility

All code here is used to handle the dataset. There are various different one-off scripts that were used when situation was required. Main script is process_data.py.

process_data.py - reads the dataset in for other Python scripts to be able to use the data. Also performs transformation on data if required such as normalising.

Preliminary work

All code here belongs to preliminary work required to choose a dataset. Different datasets were compared using a subset of each.

GANN

The original aim of the dissertation was to use generative adversarial neural networks to test against false postives, however this was changed. All code here belongs to work completed before the aim changed.

dissertation's People

Contributors

susmitadumirai avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.