Git Product home page Git Product logo

cd4ml-scenarios's Introduction

Continuous Intelligence and CD4ML Workshop

This workshop contains the sample application and machine learning code used for the Continuous Delivery for Machine Learning (CD4ML) and Continuous Intelligence workshop.

This workshop is based on an existing CD4ML Workshop.

This material has been developed and is continuously evolved by ThoughtWorks and has been presented in conferences such as: ODSC Boston 2020, ODSC Europe 2020.

You can also watch a recording of this material presented at a Global Webinar.

Pre-Requisites

In order to run this workshop, you will need:

  • A valid Github account
  • A working Docker setup with at least 20 GB of space free (if running on Windows, make sure to use Linux containers)

Tools used in this workshop

As part of this workshop all of these service will be automatically setup for you as Docker containers. You do not need to download and install these services ahead of time.

Workshop Instructions

The workshop is divided into several steps, which build on top of each other. Instructions for each exercise and scenario can be found under the instructions folder. To start from the beginning click here.

The exercises build on top of each other, so you will not be able to skip steps ahead without executing them.

The Machine Learning Problems

In this workshop there are two different scenarios that you can perform.

The first is a simplified solution to a Kaggle problem posted by Corporación Favorita, a large Ecuadorian-based grocery retailer interested in improving their Sales Forecasting using data. For the purposes of this workshop, we have combined and simplified their data sets, as our goal is not to find the best predictions, but to demonstrate how to implement CD4ML.

The second is a scenario based on a problem from the Zillow group, an American online real estate company interested in improving there predications of real-estate prices.

Links to the different components of this scenario

After a successful setup of the environment, the following components are running on your machine. You can find a homepage to navigate to any of these services here

Collaborators

The material, ideas, and content developed for this workshop were contributions from (in alphabetical order):

cd4ml-scenarios's People

Contributors

andy-symonds avatar ciwin avatar dependabot[bot] avatar ericnagler avatar gmartinezramirez avatar ryandawsonuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cd4ml-scenarios's Issues

show models in minio?

Seems to me there could be value in showing the serialized models in minio (http://localhost:9000/) which can be accessed using creds in the .env file. This would show the serliazed model and the metadata:

image

You can get the same thing from the artifacts view under the mlfow experiments but seeing it in minio too might help show how the architecture hangs together.

protobuf error in running pipeline

Today when I run the pipeline in Jenkins I get this error:


[2022-05-26T11:09:07.506Z] Traceback (most recent call last):

[2022-05-26T11:09:07.506Z]   File "/var/jenkins_home/workspace/CD4ML-Scenarios_master/run_python_script.py", line 50, in <module>

[2022-05-26T11:09:07.506Z]     run_python_script(script, arguments, profiler=profiler)

[2022-05-26T11:09:07.506Z]   File "/var/jenkins_home/workspace/CD4ML-Scenarios_master/run_python_script.py", line 26, in run_python_script

[2022-05-26T11:09:07.506Z]     from scripts import register_model as executable_script

[2022-05-26T11:09:07.506Z]   File "/var/jenkins_home/workspace/CD4ML-Scenarios_master/scripts/register_model.py", line 3, in <module>

[2022-05-26T11:09:07.506Z]     from cd4ml.register_model import register_model

[2022-05-26T11:09:07.506Z]   File "/var/jenkins_home/workspace/CD4ML-Scenarios_master/cd4ml/register_model.py", line 4, in <module>

[2022-05-26T11:09:07.506Z]     import mlflow

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/__init__.py", line 34, in <module>

[2022-05-26T11:09:07.506Z]     import mlflow.tracking._model_registry.fluent

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/tracking/__init__.py", line 8, in <module>

[2022-05-26T11:09:07.506Z]     from mlflow.tracking.client import MlflowClient

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/tracking/client.py", line 16, in <module>

[2022-05-26T11:09:07.506Z]     from mlflow.entities import Experiment, Run, RunInfo, Param, Metric, RunTag, FileInfo, ViewType

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/entities/__init__.py", line 6, in <module>

[2022-05-26T11:09:07.506Z]     from mlflow.entities.experiment import Experiment

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/entities/experiment.py", line 2, in <module>

[2022-05-26T11:09:07.506Z]     from mlflow.entities.experiment_tag import ExperimentTag

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/entities/experiment_tag.py", line 2, in <module>

[2022-05-26T11:09:07.506Z]     from mlflow.protos.service_pb2 import ExperimentTag as ProtoExperimentTag

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/protos/service_pb2.py", line 18, in <module>

[2022-05-26T11:09:07.506Z]     from .scalapb import scalapb_pb2 as scalapb_dot_scalapb__pb2

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/mlflow/protos/scalapb/scalapb_pb2.py", line 29, in <module>

[2022-05-26T11:09:07.506Z]     options = _descriptor.FieldDescriptor(

[2022-05-26T11:09:07.506Z]   File "/usr/local/lib/python3.9/dist-packages/google/protobuf/descriptor.py", line 560, in __new__

[2022-05-26T11:09:07.506Z]     _message.Message._CheckCalledFromGeneratedFile()

[2022-05-26T11:09:07.506Z] TypeError: Descriptors cannot not be created directly.

But it was working for me yesterday.

There were commits yesterday and also I am now using rancher desktop but I suspect this is about pinning versions just like databrickslabs/dbx#257 for which the solution is databrickslabs/dbx@bf56196

Builf fails with an error on the timestamps

Hi there I am preparing for the CD4ML workshop to be conducted today and I have an issue with building the project in my jenkins environment. I get the following error during the build

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:

WorkflowScript: 9: Invalid option type "timestamps". Valid option types: [authorizationMatrix, buildDiscarder, catchError, checkoutToSubdirectory, disableConcurrentBuilds, disableResume, durabilityHint, newContainerPerStage, overrideIndexTriggers, parallelsAlwaysFailFast, preserveStashes, quietPeriod, rateLimitBuilds, retry, script, skipDefaultCheckout, skipStagesAfterUnstable, timeout, waitUntil, warnError, withContext, withCredentials, withEnv, ws] @ line 9, column 8.

timestamps()

running with rancher desktop - error getting credentials docker-credential-desktop

I tried running with rancher desktop. The docker-compose up gave me:

error getting credentials - err: exec: "docker-credential-desktop": executable file not found in $PATH, out

So then I installed docker-credential-helper with:

brew install docker-credential-helper

Actually to get the installation to work I had to make a docker dir as I had uninstalled docker desktop.

Then I had docker-credential-helper. But that alone did not fix my issue. Perhaps because I didn't change credsStore in the docker config.json at that point.

What I then did was change credsStore to credStore in the docker config.json and then docker-compose up worked. Possibly that change alone might have been enough, not sure.

clarify note about ssl error

In 1-SystemSetup.md it says “you could run into an SSL error when attempting to run the download data scripts later”. What download data scripts? Is that for a different problem and not housing? I ignored this section for housing and was fine. I've created a PR for other clarifications but was not sure about how to clarify this one.

pip3 not found

Im working through your instructions and I get the following error in my Jenkins pipeline build:

/var/jenkins_home/workspace/CD4ML-Scenarios_master@tmp/durable-01e5a6c2/script.sh: line 1: pip3: not found script returned exit code 127

image

I've tried changing pip3 to pip, but I get the same error with pip: not found.

mention and explain validation plots?

There are validation plots recorded for each run. They're uploaded to minio and recorded in mlflow as artifacts:

image

Maybe the workshop should include something about this? Not mentioned in instructions at present (at least not for housing).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.