๐ฌ Setup the parameters
cp .env.sample .env
direnv allow
direnv reload
๐๏ธโโ๏ธ Train the model
make run_preprocess
make run_train
make run_evaluate
๐ Put the model in production
In MLflow set the model stage as Production
๐ฌ Inject the dataset
python get_new_data.py jan
๐ Observe the evolution of the performance
The performance of the model in production on the new data seems to be stable.
๐ No need to train a new model
๐ฌ Inject the monthly dataset
python get_new_data.py jan
๐ Observe the evolution of the performance
๐ Define with the business a performance threshold on which to act, for example a variation of the performance of $0.3
๐ค If the performance degrades significantly, train a new model
๐ค If the performance of the new model is good enough, put it in production
๐ Authenticate to Prefect
prefect auth login -k YOUR_KEY
๐ฌ Start a Prefect agent
prefect agent local start
๐ Visualize the workflow locally
make run_workflow
๐ Register the workflow in Prefect Cloud
Set PREFECT_BACKEND=production
in the .env
and direnv reload
.
In the taxifare.flow.main
module, comment out the LocalDaskExecutor
line.
make run_workflow
๐ Quick run the workflow
Run the workflow in the Prefect UI using Quick Run.
๐ Observe the performance in the notification app
Check the performance in the notification board: https://wagon-chat.herokuapp.com/
๐ Schedule the workflow
Create a schedule in the Prefect UI.
โป๏ธ For each month
๐ Inject new data
๐ Observe the performance in the notification app
๐ค Put the newly trained model in production if appropriate
๐ Register a parallel version of the workflow
In the taxifare.flow.main
module, uncomment the LocalDaskExecutor
line.
make run_workflow
๐ Observe the workflow evolution
In the Prefect UI, the workflow tasks execute in parallel whenever possible.
Let's track the evolution of the performance of our model accross time using MLflow.
Let's create a .env
file:
cp .env.sample .env
And setup the local data path in the .env
:
LOCAL_DATA_PATH=~/.lewagon/mlops/data
LOCAL_REGISTRY_PATH=~/.lewagon/mlops/training_outputs
Then direnv allow .
and direnv reload
and retrieve the latest version of the data using either:
make reset_sources_all
in order to reset datasets of all sizes in local disk + Big Querymake reset_sources_env
in order to reset datasets of sizeDATASET_SIZE
/VALIDATION_DATASET_SIZE
in local disk + Big Query
Create an account on Prefect Cloud if you do not have one.
Let's train the model with an initial 500k
dataset with:
DATASET_SIZE=500k
VALIDATION_DATASET_SIZE=500k
CHUNK_SIZE=1000000
DATA_SOURCE=local
MODEL_TARGET=mlflow
๐ We boosted the chunk size and use local CSV in order to improve the speed of the trainings
Make sure to update the MLflow and GCP (for Big Query) parameters:
MLFLOW_EXPERIMENT
MLFLOW_MODEL_NAME
PROJECT
DATASET
Update and reload your .env
, then train the model and observe the raw performance in mlflow
:
make run_preprocess
make run_train
Now that the initial model is trained and stored in MLflow, let's inject new data and see how the model behaves.
Update and reload your .env
to play with the new data source:
DATASET_SIZE=new
VALIDATION_DATASET_SIZE=new
For January, February and March:
- Inject new data with
python get_new_data.py jan
- Preprocess the data with
make run_preprocess
- Observe how the model in production performes on the new data with
make run_evaluate
- If the model performance degrade more than $.1, train a new model from the model in production with
make run_train
- Observe the performance of the new model
- If the performance of the new model is good enough, annotate the new model to be in production in mlflow
๐ Performance should be stable in January and start to evolve from February
Now that we understand how to track the performance of our model, let's automate the model lifecycle.
Authenticate to Prefect Cloud with prefect auth login -k YOUR_KEY
.
Start a local agent so that the outcome of the workflow runs on your machine are persisted in Prefect.
Let's go through our datasets all over again, but this time using Prefect in order to automate everything.
We want to start over from the latest model having been trained from scratch on the 500k
dataset.
Go to MLflow and mark the latest model trained from scratch on the 500k
dataset as in Production
.
Update and reload your .env
with parameters for Prefect Cloud:
PREFECT_BACKEND=production
PREFECT_FLOW_NAME=taxifare_lifecycle_<user.github_nickname>
๐ Make sur your project exists in Prefect Cloud (the default is taxifare_project
)
Register your workflow:
make run_workflow
Connect to Prefect Cloud:
- ๐ Navigation: verify that your workflow has been uploaded
- ๐ Agents: verify that your local agent is running
Let's inject the January dataset with python get_new_data.py jan
.
Then run the workflow once through the Prefect Cloud UI:
- ๐ Quick run
๐ Now that everything is connected, we can schedule the workflow to run automatically
Configure the channel
in taxifare.flow.flow
module.
Schedule the workflow in the Prefect Cloud UI:
- ๐ Schedules
Have a look at the notification board: https://wagon-chat.herokuapp.com/
The sequential workflow is pretty slow, let's see how we can run it faster.
In the taxifare.flow.flow
module, uncomment the LocalDaskExecutor
line.
Register the new workflow using make run_workflow
.
In the Prefect Cloud UI:
- ๐ Task run: have a look at the logs