Git Product home page Git Product logo

wikipage's Introduction

AI/ML project template

This is the template for any ML project.

Folder structure and Code Organisation

As much as you can organize your code in such as way it allow easy collaboration and usability.

Code organisation

Code organisation

As much as you can organize your code in such as way it allow easy collaboration and usability. It is recommended to organise the code based on the part of the pipeline you are creating such as (data, training, prediction, etc.). It consits the following folder structures.

artefacts/
code/
dataset/
  • artefacts should have all the relevent artefacts of the project.
  • code should have all the relevent source-code of the project.
  • datasets should have all the datasets of the project here.
Naming Convention

This post provide a detailed guideline on how to structure and organise python code like a pro

Naming Convention

Code documentation

Documentation

Documentation provides the best way for organizing the code. It facilitates easy collaboration, code readability and usability. As much as you could, it recommended to provide the following aspects as part of the documentation:

  • Comments: Terse descriptions of why a piece of code exists.
  • Typing: Specification of a function's inputs and outputs data types, providing insight into what a function consumes and produces at a glance.
def sensor_data(seq: Sequence, max_seq_len: int = 0) -> np.ndarray:
    ...
    return data
  • Docstrings: This is a meaningful description that describes the overall function or module and arguments, returns, etc.

If you're using Visual Studio Code (highly recommend), you should get the free Python Docstrings Generator extension so you can type """ under a function and then hit the Shift key to generate a template docstring. It will autofill parts of the docstring using the typing information and even exception in your code!

  • Documentation: A rendered webpage that summarizes all the functions, classes, API calls, workflows, examples, etc., so we can view and traverse through the code base without actually having to look at the code just yet. You will create the documentation automatically from the codebase using the docstring provided.

Packaging

It is advised to create an environment explicitly detail all the requirements (python version, packages, etc.) for each project.

Creating environment

Packaging

There are many recommended options when it comes to packaging in Python such virtualenv or anaconda environment.

To use virtualen you have to follow the following steps:

  1. Installing virtualenv (Documentation https://virtualenv.pypa.io/en/stable/)

    pip install virtualenv

    OR

    sudo apt install virtualenv

  2. Create virtualenv with no libraries from your existing python installation

    python3 -m venv imr_venv or virtualenv imr_venv on windows computers.

  3. Active virtualenv

    source imr_venv/bin/activate or .\imr_venv\Scripts\activate or windows machine

  4. Installing packages

    pip install package_name4.

  5. Generate requirements.txt. This gives the list of packages, and it’s version.

    pip freeze > requirements.txt

  6. Install requirements from requirements.txt

    pip install -r requirements.txt

If you are interested on using conda environment

  1. Install anaconda or miniconda

  2. Create conda environment with specific python versions too.

    conda create -n imr_env python=3

  3. Activate conda environment ````conda activate mr_env``

  4. Install packages conda install package_name

  5. Generate environment.yml. This gives the list of packages, and it’s version.

    conda env create -f environment.yml

  6. To list the package, we need to use the below command.

    conda env list

Reproducibility

Versioning your project will ensure reproducible behavior.

Versioning code

Versioning your project will ensure reproducible behavior. Versioning control is a fundamental tool in most software engineering. It tracks any change done to code, allowing the user to review the entire history of changes and possibly revert to an older version.

IMR uses gitlab withich a git cloud-based versioning services. For basic understanding of versionong commands and best practises we recommdend following this tutorial and this video talk

Follow this installation instruction to install git environment in your computer.

After installation make sure yo set up use-mane and email. git config --global user.name "imr_user"

git config --global user.email [email protected]

Versioning artefacts

It is important to version the data and artfects associated with your project. We will use the Data Version Control (DVC) library for it's simplicity, rich features and most importantly modularity. DVC interact perfectly with Git.

DVC has lots of other useful features (metrics, experiments, etc.) so be sure to explore those as well. Versioning your project will ensure reproducible behavior.

To use DVC

  1. Install DVC

    pip install dvc

    or download DVC version for your OS by following this link

  2. Initialize an existing Git repository.

    dvc init

  3. Establish where our remote storage will be. We be using the artefacts directory which won't be checked into our remote repository.

    dvc remote add -d storage artefacts

  4. Add files to be tracked by DVC. This will create text pointer files for each file

    dvc add artefacts/data/projects.json

  5. Push to remote storage

    dvc push

For more details on how to use DVC you can follow this tutorial and this dvc exercise

Creating Demo

At this stage, we need to create a data app/dashboard with an easy-to-use UI for your ML model function. The data app should easily integrate with the developed codebase. It should further allow sharing your key findings, results and demonstrator to all critical stakeholders without any technical with a certain level of interactivity.

Several off-the-shelf and open-source platforms such as Dash, Gradio, Streamlit, Panel etc for creating interactive and easy to use data apps exist.

Creating docker container of your project

wikipage's People

Contributors

sambaiga avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.