Git Product home page Git Product logo

sverrenystad / machine-learning-starting-kit Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 18.19 MB

Template for machine learning projects, featuring a diverse collection of ML models, AutoML solutions, and simple EDA tools for streamlined project development. Users only need to specify target features and add their data path in the Config to kickstart a wide array of machine learning tasks

License: MIT License

Python 15.39% Jupyter Notebook 84.61%
ai automl machine-learning-algorithms auto-eda

machine-learning-starting-kit's Introduction

Machine Learning Starting Kit

Template for machine learning projects, featuring a diverse collection of ML models, AutoML solutions, and simple EDA tools for streamlined project development. Users only need to specify target features and add their data path in the Config to kickstart a wide array of machine learning tasks.

Table of Contents

How to Use

To use the project one must give the project access to the data. This can either be done by uploading the data in the data folder and specify the path to the data in the config script. Or in the case the dataset is too large to be on the device, one can create a new DataLoader class in the data_loader.py script and configure the default data loader in the config script to use the new DataLoader class. After one have given access to the data one must specify the target feature in the config script. After this the project is ready to be used.

Start by running the different notebooks in the EDA folder to get a better understanding of the data. After this one can start running the different models in the models folder.

Project Organization

Each folder in the project has a specific purpose and is organized as follows:

Click to expand
├── .github
│   └── workflows                  # Github actions for CI/CD
|
├── data
│   ├── external                   # Data from third party sources.
│   ├── processed                  # The final, feature-engineered data sets for modeling.
│   └── raw                        # The original, immutable data set.
|
├── docs                           # Design documents (or other project documentation)
│   └── sphinx_docs                # A default Sphinx project; see sphinx-doc.org for details
|
├── eda                            # Notebooks for exploratory data analysis and data visualization
|
├── models                         # Training and prediction scripts for different models, including AutoML solutions.
|
├── results
│   ├── figures                    # Generated graphics and figures to be used in reporting
│   ├── predictions                # Model predictions as CSV files 
│   └── reports                    # Generated analysis as HTML, PDF, LaTeX, etc.
|
├── src                            
│   ├── config.py                  # Configuration file for the project
│   ├── ml_service.py              # A class that contains all the functions needed to train and save predictions of the models
│   ├── data                       # Scripts to fetch training and testing data
│   │   └── data_loader.py         
│   ├── features                   # Scripts to preprocess raw data into better features for modeling
│   │   ├── feature_engineering.py 
│   │   └── post_processing.py     # Script to use domain knowledge to post process the predictions
│   └── visualization              # Scripts to create exploratory and results oriented visualizations
│       └── visualize.py           
|
├── test                           # Scripts to test the project
└── requirements.txt               # The requirements file for reproducing the analysis environment, e.g.,
                                   # generated with `pip freeze > requirements.txt`

Resources

Feature Selection resources

machine-learning-starting-kit's People

Contributors

sverrenystad avatar

Stargazers

 avatar

Watchers

 avatar

machine-learning-starting-kit's Issues

Add XAI models

There is extremely important to understand why the model give certain predictions. Understanding how the different features affect the final predictions can be done by XAI models. Such models should be added.

Add the following models:

  • SHAP
  • LIME

Model evaluation and analysis.

For regression tasks:

  • Residuals Plot: plot the difference between the expected and actual values
  • Prediction Error Plot: plot the expected vs. actual values in model space
  • Alpha Selection: visual tuning of regularization hyperparameters

For classification tasks:

  • Confusion Matrix: The confusion matrix is a table that describes the performance of a classification model by showing the counts of true positive, true negative, false positive, and false negative predictions. It provides insight into where the model is making errors and can be used to calculate other metrics like accuracy, precision, recall, and F1-score.
  • ROC Curve: The ROC (Receiver Operating Characteristic) curve plots the true positive rate (recall) against the false positive rate (1-specificity) at various threshold settings. The area under the ROC curve (AUC) is a single scalar value that summarizes the model's ability to distinguish between classes. A higher AUC indicates a better performing model.
  • Recall Curve: The precision-recall curve plots precision (the ratio of true positive predictions to the total positive predictions) against recall at various threshold settings. It is particularly useful for evaluating models on imbalanced datasets where the positive class is rare. It helps in understanding the trade-off between precision and recall.

For both:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.