Git Product home page Git Product logo

cloud-data-recap-dm's Introduction

โ“ TL;DR MLflow instructions

Train a model from scratch on the 500k dataset

๐ŸŽฌ Setup the parameters
cp .env.sample .env
direnv allow
direnv reload
๐Ÿ‹๏ธโ€โ™‚๏ธ Train the model
make run_preprocess
make run_train
make run_evaluate
๐Ÿ Put the model in production

In MLflow set the model stage as Production

Handle the January dataset

๐ŸŽฌ Inject the dataset
python get_new_data.py jan
๐Ÿ‘€ Observe the evolution of the performance

The performance of the model in production on the new data seems to be stable.

๐Ÿ‘‰ No need to train a new model

Handle other monthly datasets

๐ŸŽฌ Inject the monthly dataset
python get_new_data.py jan
๐Ÿ‘€ Observe the evolution of the performance

๐Ÿ‘‰ Define with the business a performance threshold on which to act, for example a variation of the performance of $0.3

๐Ÿค” If the performance degrades significantly, train a new model

๐Ÿค” If the performance of the new model is good enough, put it in production

โ“ TL;DR Prefect instructions

Workflow setup and local visualize

๐Ÿ”‘ Authenticate to Prefect
prefect auth login -k YOUR_KEY
๐ŸŽฌ Start a Prefect agent
prefect agent local start
๐Ÿ‘€ Visualize the workflow locally
make run_workflow

Workflow quick run

๐Ÿ“ Register the workflow in Prefect Cloud

Set PREFECT_BACKEND=production in the .env and direnv reload.

In the taxifare.flow.main module, comment out the LocalDaskExecutor line.

make run_workflow
๐Ÿš• Quick run the workflow

Run the workflow in the Prefect UI using Quick Run.

๐Ÿ‘€ Observe the performance in the notification app

Check the performance in the notification board: https://wagon-chat.herokuapp.com/

Run the automated workflow

๐Ÿ“† Schedule the workflow

Create a schedule in the Prefect UI.

โ™ป๏ธ For each month

๐Ÿ’‰ Inject new data

๐Ÿ‘€ Observe the performance in the notification app

๐Ÿค” Put the newly trained model in production if appropriate

Optimize the workflow

๐Ÿ“ Register a parallel version of the workflow

In the taxifare.flow.main module, uncomment the LocalDaskExecutor line.

make run_workflow
๐Ÿ‘€ Observe the workflow evolution

In the Prefect UI, the workflow tasks execute in parallel whenever possible.

MLflow

Let's track the evolution of the performance of our model accross time using MLflow.

Setup

Let's create a .env file:

cp .env.sample .env

And setup the local data path in the .env:

LOCAL_DATA_PATH=~/.lewagon/mlops/data
LOCAL_REGISTRY_PATH=~/.lewagon/mlops/training_outputs

Then direnv allow . and direnv reload and retrieve the latest version of the data using either:

  • make reset_sources_all in order to reset datasets of all sizes in local disk + Big Query
  • make reset_sources_env in order to reset datasets of size DATASET_SIZE / VALIDATION_DATASET_SIZE in local disk + Big Query

Create an account on Prefect Cloud if you do not have one.

Past data

Let's train the model with an initial 500k dataset with:

DATASET_SIZE=500k
VALIDATION_DATASET_SIZE=500k
CHUNK_SIZE=1000000

DATA_SOURCE=local
MODEL_TARGET=mlflow

๐Ÿ‘‰ We boosted the chunk size and use local CSV in order to improve the speed of the trainings

Make sure to update the MLflow and GCP (for Big Query) parameters:

  • MLFLOW_EXPERIMENT
  • MLFLOW_MODEL_NAME
  • PROJECT
  • DATASET

Update and reload your .env, then train the model and observe the raw performance in mlflow:

  • make run_preprocess
  • make run_train

New data

Now that the initial model is trained and stored in MLflow, let's inject new data and see how the model behaves.

Update and reload your .env to play with the new data source:

DATASET_SIZE=new
VALIDATION_DATASET_SIZE=new

For January, February and March:

  • Inject new data with python get_new_data.py jan
  • Preprocess the data with make run_preprocess
  • Observe how the model in production performes on the new data with make run_evaluate
  • If the model performance degrade more than $.1, train a new model from the model in production with make run_train
  • Observe the performance of the new model
  • If the performance of the new model is good enough, annotate the new model to be in production in mlflow

๐Ÿ‘‰ Performance should be stable in January and start to evolve from February

Prefect

Now that we understand how to track the performance of our model, let's automate the model lifecycle.

Authenticate to Prefect Cloud with prefect auth login -k YOUR_KEY.

Start a local agent so that the outcome of the workflow runs on your machine are persisted in Prefect.

Reset your data source

Let's go through our datasets all over again, but this time using Prefect in order to automate everything.

We want to start over from the latest model having been trained from scratch on the 500k dataset.

Go to MLflow and mark the latest model trained from scratch on the 500k dataset as in Production.

Sequential workflow

Update and reload your .env with parameters for Prefect Cloud:

PREFECT_BACKEND=production
PREFECT_FLOW_NAME=taxifare_lifecycle_<user.github_nickname>

๐Ÿ‘‰ Make sur your project exists in Prefect Cloud (the default is taxifare_project)

Register your workflow:

make run_workflow

Connect to Prefect Cloud:

  • ๐Ÿ‘€ Navigation: verify that your workflow has been uploaded
  • ๐Ÿ‘€ Agents: verify that your local agent is running

Go through months

Let's inject the January dataset with python get_new_data.py jan.

Then run the workflow once through the Prefect Cloud UI:

  • ๐Ÿ‘€ Quick run

๐Ÿ‘‰ Now that everything is connected, we can schedule the workflow to run automatically

Configure the channel in taxifare.flow.flow module.

Schedule the workflow in the Prefect Cloud UI:

  • ๐Ÿ‘€ Schedules

Have a look at the notification board: https://wagon-chat.herokuapp.com/

Parallel workflow

The sequential workflow is pretty slow, let's see how we can run it faster.

In the taxifare.flow.flow module, uncomment the LocalDaskExecutor line.

Register the new workflow using make run_workflow.

In the Prefect Cloud UI:

  • ๐Ÿ‘€ Task run: have a look at the logs

cloud-data-recap-dm's People

Contributors

hiroshi18 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.