Git Product home page Git Product logo

sverrenystad / machine-learning-starting-kit Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 22.71 MB

Template for machine learning projects, featuring a diverse collection of ML models, AutoML solutions, and simple EDA tools for streamlined project development. Users only need to specify target features and add their data path in the Config to kickstart a wide array of machine learning tasks

License: MIT License

Python 0.23% Jupyter Notebook 99.77%
ai automl machine-learning-algorithms auto-eda

machine-learning-starting-kit's Introduction

Machine Learning Starting Kit

Template for machine learning projects, featuring a diverse collection of ML models, AutoML solutions, and simple EDA tools for streamlined project development. Users only need to specify target features and add their data path in the Config to kickstart a wide array of machine learning tasks.

Table of Contents

How to Use

To use the project one must give the project access to the data. This can either be done by uploading the data in the data folder and specify the path to the data in the config script. Or in the case the dataset is too large to be on the device, one can create a new DataLoader class in the data_loader.py script and configure the default data loader in the config script to use the new DataLoader class. After one have given access to the data one must specify the target feature in the config script. After this the project is ready to be used.

Start by running the different notebooks in the EDA folder to get a better understanding of the data. After this one can start running the different models in the models folder.

Project Organization

Each folder in the project has a specific purpose and is organized as follows:

Click to expand
├── .github
│   └── workflows                  # Github actions for CI/CD
|
├── data
│   ├── external                   # Data from third party sources.
│   ├── processed                  # The final, feature-engineered data sets for modeling.
│   └── raw                        # The original, immutable data set.
|
├── docs                           # Design documents (or other project documentation)
│   └── sphinx_docs                # A default Sphinx project; see sphinx-doc.org for details
|
├── eda                            # Notebooks for exploratory data analysis and data visualization
|
├── models                         # Training and prediction scripts for different models, including AutoML solutions.
|
├── results
│   ├── figures                    # Generated graphics and figures to be used in reporting
│   ├── predictions                # Model predictions as CSV files 
│   └── reports                    # Generated analysis as HTML, PDF, LaTeX, etc.
|
├── src                            
│   ├── config.py                  # Configuration file for the project
│   ├── ml_service.py              # A class that contains all the functions needed to train and save predictions of the models
│   ├── data                       # Scripts to fetch training and testing data
│   │   └── data_loader.py         
│   ├── features                   # Scripts to preprocess raw data into better features for modeling
│   │   ├── feature_engineering.py 
│   │   └── post_processing.py     # Script to use domain knowledge to post process the predictions
│   └── visualization              # Scripts to create exploratory and results oriented visualizations
│       └── visualize.py           
|
├── test                           # Scripts to test the project
└── requirements.txt               # The requirements file for reproducing the analysis environment, e.g.,
                                   # generated with `pip freeze > requirements.txt`

Resources

Feature Selection resources

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.