Git Product home page Git Product logo

network-log-and-traffic-analysis's Introduction

Network-Log-and-Traffic-Analysis

Identify malicious behavior and attacks using Machine Learning with Python

LAB A

We'll be using IPython and panads functionality in this part.

Our first goal is to get the information from the log files off of disk and into a dataframe.

Since we're working with limited resources we'll use samples of the larger files.

Requirements

IPython
Pandas
Matplotlib
Seaborn
datetime
warnings

Tip

To access keyboard shortcuts click on a (non-code) cell or the text "In []" to the left of the cell, and press the H key. Or select Help from the menu above, and then Keyboard Shortcuts. **Very useful saved us a lot of time during editing.

Business Understanding

Overview

The dataset that we've selected is from the field of Network Analysis and Security. We are using log files generated by BRO Network Security Monitor as our dataset. The dataset we've choosen has about 20 million records ( about 2 GB in size) and has 22 features with a number of sub-features explained in the feature description sections that follow.

We'll be analyzing the log file, finding the correlation between attack behavioud and the features to come up with probable conclusions and results that helped us in identifying malicious behavior and potential threats and attacks in the network of our dataset.

The plan is to understand the dataset, the features, attack behaviours, and their descriptions in-detail as they are stated by Bro.

We will do a lot of preprocessing including elimination, grouping, standardization, and imputation to try and make the dataset more convenient to work on.

After getting the dataset ready to be processed for extracting valuable statistical information, we then visualized those statistical information using the most appropriate plots (in our case, box plot was used extensively). Then we grouped some of the features (use them to visualize relationships) and then use correlation matrix to represent all relationships between the different features that are important in our analysis (for example,the services and packets generated as well as received have a high corelation).

Purpose

We selected this dataset because it is a complex as well as a technical dataset that is used on live data retaining value depending on its freshness. We are interested in learning more about security, its attacks, and their patterns.

The amount of real-time processing that can be done by analyzing the data collected can reduce a lot of manual work and catch patterns in attacks that occur over a large period of time that a human cannot identify.

These logs also allow us to see the amount of data being transferred and allowing organizations to allocate bandwidth depending based on the future scope of usage patterns.

Data

Full dataset available here. This is the conn.log.

network-log-and-traffic-analysis's People

Contributors

alexamanpreet avatar sarah-noor-12232 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.