Git Product home page Git Product logo

datastreamingclient_model_training's Introduction

How to improve your model training with DagsHub Direct Data Access on EC2 instance.

DagsHub + Actions + EC2 + CML

Main Concepts Covered

After completing this repository, you will be able to understand the following concepts:

  • Provision an AWS EC2 and running the training of a BERT model with CML
  • Apply DagsHub Direct Data Access to improve your training process.
  • Implement a Github actions pipeline using the previous instance.
  • Automatically log your models metrics with MLFlow.
  • Compare the model performance with MLflow Experiment.
  • Automatically save your training metadata on DVC for easy tracking.

Workflow Comparison: Streaming Vs. Regular Approach

Regular Approach

This approach can take significant amount of time to get the data from pull instruction.

Regular Workflow

Streaming Approach

Streaming simplifies the complexity of data collection by introducing a parallel computation approach, which starts the model training while getting the data from the storage.

Streaming Workflow

Getting started

1. Prerequisites

Platforms

AWS EC2 instance properties

A free tier is enough for this use case. Below are the properties of the EC2 instance used for the use case.

  • cloud type: t2.micro.
  • cloud-region: us-east-1a
Other platforms

Other ressources

  • Python 3.9.1
  • DVC 2.11
  • CML
  • You can find all the additional information in the requirements.txt file

Results On DagsHub

DagsHub provides the capabilities to use MLFlow and DVC while giving the choice of working on Github. The following results are the experiments from DagsHub, using MLFlow to track the model F1-Score, Precision and Recall.

MLFlow metrics for each epoch.

The following graphics shows the performances of the models for 3 epochs.

Model Performance for 3 epochs

General workflow

Below is the general workflow of the pipeline.

Model Performance for 3 epochs

Full Article Coming Soon On Medium

datastreamingclient_model_training's People

Contributors

keitazoumana avatar

Watchers

 avatar  avatar

Forkers

taltaf913

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.