Group 1: Daan van der Valk and Sandesh Manganahalli Jayaprakash
Lab assignment 3 for Cyber Data Analytics, the TU Delft course.
All included scripts should be run with Python 3. We used Python 3.6.4 to be specific, but hopefully any Python 3 version would suffice.
The following packages should be installed, which can be done using pip (pip install <package>
) or Conda (conda install <package>
), whatever you prefer.
matplotlib
scipy
sklearn
pydotplus
graphviz
(depending on the environment, alsopython-graphviz
)imblearn
joblib
seaborn
pandas
hashlib
Note: the datasets are not included in our repository. Please follow the following steps:
- Clone this repository.
- Download the following datasets:
- For task 1 and 2: the labeled netflows from scenario 6: capture20110816.pcap.netflow.labeled (245 MB)
- For task 3 and 4: the labeled netflows from scenario 10: capture20110818.pcap.netflow.labeled (489 MB)
- Execute the code - for example, the scripts highlighted below :)
- The technique used and explained: min-sampling.py
- The included results are generated using min-sampling-tests.py and produced the output min-sampling-test-results.md
- The technique explained en iterated over multiple parameters: count_min.py
- The results for our best-performing dimensions (relatively): count_min_with_best_params.py
- Demonstration of technique, including visualizations: discretization.py
- The included heatmaps are found in Heatmap_legitimate.svg and Heatmap_botnet.svg.
- The included plots of the flows:
- Discretization as used to profile the infected host and select 30% of the legitimate traffic: discretization_for_ngrams.py
- Profiling of the infected host and applying the detection system: botnet_profiling.py