Git Product home page Git Product logo

ggjay9 / application-flow-identification Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 92.82 MB

Classify applications using flow features with Random Forest and K-Nearest Neighbor classifiers. Explore augmentation techniques like oversampling, SMOTE, BorderlineSMOTE, and ADASYN for better handling of underrepresented classes. Measure classifier effectiveness for different sampling techniques using accuracy, precision, recall, and F1-score.

Jupyter Notebook 100.00%
adasyn-sampling application-identification knn machine-learning network-analysis random-forest sampling-methods smote-sampling xai-evaluation

application-flow-identification's Introduction

Application Flow Identification

The project provides a comprehensive analysis of various aspects of data preprocessing, machine learning model optimization, and testing performances with advanced tasks focusing on Explainable Artificial Intelligence (XAI). Here's a summary of the key points:

  • Raw Data Visualization/Analysis:

    • Removal of irrelevant columns, handling missing values, and columns with all zeroes to refine the dataset.
    • Analysis of the dataset identified 22 different application flows and a maximum of 1000 samples for an application.
    • Visualization techniques were used to understand data correspondence and class imbalance, revealing that 7 applications had fewer samples than others.
  • Data Preprocessing:

    • Evaluation of over-sampling techniques to address dataset imbalance, including Random Over-sampling, SMOTE, Borderline-SMOTE, and ADASYN, each aiming to balance the dataset effectively.
  • ML Models Optimization and Training:

    • Optimization of Random Forest and K-Nearest Neighbors (KNN) algorithms through hyperparameter tuning and model evaluation.
    • The process involved data preprocessing, train-test split, hyperparameter optimization using the hyperopt library, model training, and evaluation.
  • Testing the Performance:

    • Performance metrics and confusion matrices were used to evaluate the effectiveness of classifiers and oversampling techniques.
    • A comparative analysis of per-class classification metrics highlighted the impact of oversampling on reducing false negatives and improving recall and f1-scores.
  • Advanced Task - Explainable AI (XAI):

    • Focus on making machine learning algorithms more transparent and understandable using the SHAP library to analyze the contribution of each feature in predictions.
    • An examination of the importance of features across different oversampling methods showed that certain features consistently contributed to predictions, emphasizing their significance.

It concludes that while over-sampling techniques and model optimizations can improve dataset balance and model performance, the effectiveness varies based on the dataset and problem domain. The advanced task highlights the potential of XAI in providing insights into model predictions and the importance of certain features, leading to more informed decision-making in model development and evaluation.

This project was a joint effort with my colleague Filippo Kubler, we worked together on developing code, analyzing data, and generating visual aids to support our results.

application-flow-identification's People

Contributors

ggjay9 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.