The cloud-data-recap-dm from hiroshi18

❓ TL;DR MLflow instructions

Train a model from scratch on the 500k dataset

🎬 Setup the parameters

cp .env.sample .env
direnv allow
direnv reload

🏋️‍♂️ Train the model

make run_preprocess
make run_train
make run_evaluate

🏁 Put the model in production

In MLflow set the model stage as Production

Handle the January dataset

🎬 Inject the dataset

python get_new_data.py jan

👀 Observe the evolution of the performance

The performance of the model in production on the new data seems to be stable.

👉 No need to train a new model

Handle other monthly datasets

🎬 Inject the monthly dataset

python get_new_data.py jan

👀 Observe the evolution of the performance

👉 Define with the business a performance threshold on which to act, for example a variation of the performance of $0.3

🤔 If the performance degrades significantly, train a new model

🤔 If the performance of the new model is good enough, put it in production

❓ TL;DR Prefect instructions

Workflow setup and local visualize

🔑 Authenticate to Prefect

prefect auth login -k YOUR_KEY

🎬 Start a Prefect agent

prefect agent local start

👀 Visualize the workflow locally

make run_workflow

Workflow quick run

📝 Register the workflow in Prefect Cloud

Set PREFECT_BACKEND=production in the .env and direnv reload.

In the taxifare.flow.main module, comment out the LocalDaskExecutor line.

make run_workflow

🚕 Quick run the workflow

Run the workflow in the Prefect UI using Quick Run.

👀 Observe the performance in the notification app

Check the performance in the notification board: https://wagon-chat.herokuapp.com/

Run the automated workflow

📆 Schedule the workflow

Create a schedule in the Prefect UI.

♻️ For each month

💉 Inject new data

👀 Observe the performance in the notification app

🤔 Put the newly trained model in production if appropriate

Optimize the workflow

📝 Register a parallel version of the workflow

In the taxifare.flow.main module, uncomment the LocalDaskExecutor line.

make run_workflow

👀 Observe the workflow evolution

In the Prefect UI, the workflow tasks execute in parallel whenever possible.

MLflow

Let's track the evolution of the performance of our model accross time using MLflow.

Setup

Let's create a .env file:

cp .env.sample .env

And setup the local data path in the .env:

LOCAL_DATA_PATH=~/.lewagon/mlops/data
LOCAL_REGISTRY_PATH=~/.lewagon/mlops/training_outputs

Then direnv allow . and direnv reload and retrieve the latest version of the data using either:

make reset_sources_all in order to reset datasets of all sizes in local disk + Big Query
make reset_sources_env in order to reset datasets of size DATASET_SIZE / VALIDATION_DATASET_SIZE in local disk + Big Query

Create an account on Prefect Cloud if you do not have one.

Past data

Let's train the model with an initial 500k dataset with:

DATASET_SIZE=500k
VALIDATION_DATASET_SIZE=500k
CHUNK_SIZE=1000000

DATA_SOURCE=local
MODEL_TARGET=mlflow

👉 We boosted the chunk size and use local CSV in order to improve the speed of the trainings

Make sure to update the MLflow and GCP (for Big Query) parameters:

MLFLOW_EXPERIMENT
MLFLOW_MODEL_NAME
PROJECT
DATASET

Update and reload your .env, then train the model and observe the raw performance in mlflow:

make run_preprocess
make run_train

New data

Now that the initial model is trained and stored in MLflow, let's inject new data and see how the model behaves.

Update and reload your .env to play with the new data source:

DATASET_SIZE=new
VALIDATION_DATASET_SIZE=new

For January, February and March:

Inject new data with python get_new_data.py jan
Preprocess the data with make run_preprocess
Observe how the model in production performes on the new data with make run_evaluate
If the model performance degrade more than $.1, train a new model from the model in production with make run_train
Observe the performance of the new model
If the performance of the new model is good enough, annotate the new model to be in production in mlflow

👉 Performance should be stable in January and start to evolve from February

Prefect

Now that we understand how to track the performance of our model, let's automate the model lifecycle.

Authenticate to Prefect Cloud with prefect auth login -k YOUR_KEY.

Start a local agent so that the outcome of the workflow runs on your machine are persisted in Prefect.

Reset your data source

Let's go through our datasets all over again, but this time using Prefect in order to automate everything.

We want to start over from the latest model having been trained from scratch on the 500k dataset.

Go to MLflow and mark the latest model trained from scratch on the 500k dataset as in Production.

Sequential workflow

Update and reload your .env with parameters for Prefect Cloud:

PREFECT_BACKEND=production
PREFECT_FLOW_NAME=taxifare_lifecycle_<user.github_nickname>

👉 Make sur your project exists in Prefect Cloud (the default is taxifare_project)

make run_workflow

Connect to Prefect Cloud:

👀 Navigation: verify that your workflow has been uploaded
👀 Agents: verify that your local agent is running

Go through months

Let's inject the January dataset with python get_new_data.py jan.

Then run the workflow once through the Prefect Cloud UI:

👀 Quick run

👉 Now that everything is connected, we can schedule the workflow to run automatically

Configure the channel in taxifare.flow.flow module.

Schedule the workflow in the Prefect Cloud UI:

👀 Schedules

Have a look at the notification board: https://wagon-chat.herokuapp.com/

Parallel workflow

The sequential workflow is pretty slow, let's see how we can run it faster.

In the taxifare.flow.flow module, uncomment the LocalDaskExecutor line.

In the Prefect Cloud UI:

👀 Task run: have a look at the logs

hiroshi18 / cloud-data-recap-dm Goto Github PK

cloud-data-recap-dm's Introduction