ML Ops with GitHub Actions and Azure Machine Learning

This template can be used for easily setting up a data science or machine learning project with automated training and deployment using GitHub Actions and Azure Machine Learning. For a more comprehensive version of this automated pipeline, see the aml-template repository.

Getting started

YouTube Video

Click on the image to view the video on YouTube. The video shows you the setup process, which is also described below:

1. Prerequisites

The following prerequisites are required to make this repository work:

Azure subscription
Contributor access to the Azure subscription
Access to GitHub Actions

If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.

2. Create repository

To get started with ML Ops, simply create a new repo based off this template, by clicking on the green "Use this template" button:

3. Setting up the required secrets

A service principal needs to be generated for authentication and getting access to your Azure subscription. We suggest adding a service principal with contributor rights to a new resource group or to the one where you have deployed your existing Azure Machine Learning workspace. Just go to the Azure Portal to find the details of your resource group or workspace. Then start the Cloud CLI or install the Azure CLI on your computer and execute the following command to generate the required credentials:

# Replace {service-principal-name}, {subscription-id} and {resource-group} with your 
# Azure subscription id and resource group name and any name for your service principle
az ad sp create-for-rbac --name {service-principal-name} \
                         --role contributor \
                         --scopes /subscriptions/{subscription-id}/resourceGroups/{resource-group} \
                         --sdk-auth

This will generate the following JSON output:

{
  "clientId": "<GUID>",
  "clientSecret": "<GUID>",
  "subscriptionId": "<GUID>",
  "tenantId": "<GUID>",
  (...)
}

Add this JSON output as a secret with the name AZURE_CREDENTIALS in your GitHub repository:

To do so, click on the Settings tab in your repository, then click on Secrets and finally add the new secret with the name AZURE_CREDENTIALS to your repository.

Please follow this link for more details.

4. Define your workspace parameters

You have to modify the parameters in the /.cloud/.azure/workspace.json" file in your repository, so that the GitHub Actions create or connect to the desired Azure Machine Learning workspace. Just click on the link and edit the file.

Please use the same value for the resource_group parameter that you have used when generating the azure credentials. If you already have an Azure ML Workspace under that resource group, change the name parameter in the JSON file to the name of your workspace, if you want the Action to create a new workspace in that resource group, pick a name for your new workspace, and assign it to the name parameter. You can also delete the name parameter, if you want the action to use the default value, which is the repository name.

Once you save your changes to the file, the predefined GitHub workflow that trains and deploys a model on Azure Machine Learning gets triggered. Check the actions tab to view if your actions have successfully run.

5. Modify the code

Now you can start modifying the code in the code folder, so that your model and not the provided sample model gets trained on Azure. Where required, modify the environment yaml so that the training and deployment environments will have the correct packages installed in the conda environment for your training and deployment. Upon pushing the changes, actions will kick off your training and deployment run. Check the actions tab to view if your actions have successfully run.

Comment lines 39 to 55 in your "/.github/workflows/train_deploy.yml" file if you only want to train the model. Uncomment line 7 to 8, if you only want to kick off the workflow when pushing changes to the "/code/" file.

6. Viewing your AML resources and runs

The log outputs of your action will provide URLs for you to view the resources that have been created in AML. Alternatively, you can visit the Machine Learning Studio to view the progress of your runs, etc. For more details, read the documentation below.

Documentation

Code structure

File/folder	Description
`code`	Sample data science source code that will be submitted to Azure Machine Learning to train and deploy machine learning models.
`code/train`	Sample code that is required for training a model on Azure Machine Learning.
`code/train/train.py`	Training script that gets executed on a cluster on Azure Machine Learning.
`code/train/environment.yml`	Conda environment specification, which describes the dependencies of `train.py`. These packages will be installed inside a Docker image on the Azure Machine Learning compute cluster, when executing your `train.py`.
`code/train/run_config.yml`	YAML files, which describes the execution of your training run on Azure Machine Learning. This file also references your `environment.yml`. Please look at the comments in the file for more details.
`code/deploy`	Sample code that is required for deploying a model on Azure Machine Learning.
`code/deploy/score.py`	Inference script that is used to build a Docker image and that gets executed within the container when you send data to the deployed model on Azure Machine Learning.
`code/deploy/environment.yml`	Conda environment specification, which describes the dependencies of `score.py`. These packages will be installed inside the Docker image that will be used for deploying your model.
`code/test/test.py`	Test script that can be used for testing your deployed webservice. Add a `deploy.json` to the `.cloud/.azure` folder and add the following code `{ "test_enabled": true }` to enable tests of your webservice. Change the code according to the tests that zou would like to execute.
`.cloud/.azure`	Configuration files for the Azure Machine Learning GitHub Actions. Please visit the repositories of the respective actions and read the documentation for more details.
`.github/workflows`	Folder for GitHub workflows. The `train_deploy.yml` sample workflow shows you how your can use the Azure Machine Learning GitHub Actions to automate the machine learning process.
`docs`	Resources for this README.
`CODE_OF_CONDUCT.md`	Microsoft Open Source Code of Conduct.
`LICENSE`	The license for the sample.
`README.md`	This README file.
`SECURITY.md`	Microsoft Security README.

Documentation of Azure Machine Learning GitHub Actions

The template uses the open source Azure certified Actions listed below. Click on the links and read the README files for more details.

aml-workspace - Connects to or creates a new workspace
aml-compute - Connects to or creates a new compute target in Azure Machine Learning
aml-run - Submits a ScriptRun, an Estimator or a Pipeline to Azure Machine Learning
aml-registermodel - Registers a model to Azure Machine Learning
aml-deploy - Deploys a model and creates an endpoint for the model

Known issues

Error: MissingSubscriptionRegistration

Error message:

Message: ***'error': ***'code': 'MissingSubscriptionRegistration', 'message': "The subscription is not registered to use namespace 'Microsoft.KeyVault'. See https://aka.ms/rps-not-found for how to register subscriptions.", 'details': [***'code': 'MissingSubscriptionRegistration', 'target': 'Microsoft.KeyVault', 'message': "The subscription is not registered to use namespace 'Microsoft.KeyVault'. See https://aka.ms/rps-not-found for how to register subscriptions

Solution:

This error message appears, in case the Azure/aml-workspace action tries to create a new Azure Machine Learning workspace in your resource group and you have never deployed a Key Vault in the subscription before. We recommend to create an Azure Machine Learning workspace manually in the Azure Portal. Follow the steps on this website to create a new workspace with the desired name. After ou have successfully completed the steps, you have to make sure, that your Service Principal has access to the resource group and that the details in your /.cloud/.azure/workspace.json" file are correct and point to the right workspace and resource group.

What is MLOps?

MLOps empowers data scientists and machine learning engineers to bring together their knowledge and skills to simplify the process of going from model development to release/deployment. ML Ops enables you to track, version, test, certify and reuse assets in every part of the machine learning lifecycle and provides orchestration services to streamline managing this lifecycle. This allows practitioners to automate the end to end machine Learning lifecycle to frequently update models, test new models, and continuously roll out new ML models alongside your other applications and services.

This repository enables Data Scientists to focus on the training and deployment code of their machine learning project (code folder of this repository). Once new code is checked into the code folder of the master branch of this repository the GitHub workflow is triggered and open source Azure Machine Learning actions are used to automatically manage the training through to deployment phases.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

HTTP404 after the PythonScriptStep is finished

Hi,

We have a situation where submitting a Python pipeline run through gitactions, gitaction is detecting a non-zero exit code even when the underlying python script exited with a finished-zero code. Even a simple print statement step would finish but the action would report it as failed.

For example - python step:

def main():
print('AMAR Inside STEP 1 Choose Data Test')

abbreviated python code for pipeline build for submission - this is the script that aml-run action calls to build pipeline.

pipeline_steps = StepSequence(steps=[step_1_choose_data])
pipeline = Pipeline(workspace=workspace, steps=pipeline_steps)
pipeline.validate()
return pipeline

===

Individual step output

[2021-11-02T11:10:11.272611] The experiment completed successfully. Finalizing run...
Cleaning up all outstanding Run operations, waiting 900.0 seconds
3 items cleaning up...
Cleanup took 0.1468954086303711 seconds
[2021-11-02T11:10:11.547600] Finished context manager injector.
2021/11/02 11:10:13 Attempt 1 of http call to http://[REDACTED]/sendlogstoartifacts/status
2021/11/02 11:10:13 Send process info logs to master server succeeded
2021/11/02 11:10:13 Not exporting to RunHistory as the exporter is either stopped or there is no data.
Stopped: false
OriginalData: 3
FilteredData: 0.
2021/11/02 11:10:13 Process Exiting with Code: 0
2021/11/02 11:10:14 All App Insights Logs was sent successfully or the close timeout of 10 was reached

BUT, action output/report

Action output:
StepRun(STEP_1_Choose_Data) Execution Summary
232==============================================
233StepRun( STEP_1_Choose_Data ) Status: Finished
234Traceback (most recent call last):
235 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 271, in attempt_get_deps
236 blob_deps_to_file()
237 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 263, in blob_deps_to_file
238 blob = request.urlopen(deps_url, context=ssl_context)
239 File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
240 return opener.open(url, data, timeout)
241 File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
242 response = meth(req, response)
243 File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
244 response = self.parent.error(
245 File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
246 return self._call_chain(args)
247 File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
248 result = func(args)
249 File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
250 raise HTTPError(req.full_url, code, msg, hdrs, fp)
251urllib.error.HTTPError: HTTP Error 404: Not Found
252
253During handling of the above exception, another exception occurred:
254
255Traceback (most recent call last):
256 File "/code/main.py", line 240, in
257 main()
258 File "/code/main.py", line 187, in main
259 run.wait_for_completion(show_output=True)
260 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 294, in wait_for_completion
261 step_run.wait_for_completion(timeout_seconds=timeout_seconds - time_elapsed,
262 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 736, in wait_for_completion
263 return self._stream_run_output(timeout_seconds=timeout_seconds,
264 File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 827, in _stream_run_output
265 print(final_details)
266 File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
267 return func(*args, kwargs)
268 File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 766, in repr**
269 steps = self._dataflow._get_steps()
270 File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
271 return func(*args, kwargs)
272 File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 218, in _dataflow
273 dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
274 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 185, in _set_auth_type
275 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
276 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
277 _engine_api = EngineAPI()
278 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 110, in init**
279 self._message_channel = launch_engine()
280 File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
281 dependencies_path = runtime.ensure_dependencies()
282 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
283 if not attempt_get_deps():
284 File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
285 raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
286NotImplementedError: Linux distribution debian 11. does not have automatic support.
287.NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed.
288Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
289Follow your distro specific instructions to install `dotnet-runtime-` and replace `` with `2.1`.
290

Happy to discuss details.

machine-learning-apps / ml-template-azure Goto Github PK

ml-template-azure's Introduction

ML Ops with GitHub Actions and Azure Machine Learning

Getting started

YouTube Video

1. Prerequisites

2. Create repository

3. Setting up the required secrets

4. Define your workspace parameters

5. Modify the code

6. Viewing your AML resources and runs

Documentation

Code structure

Documentation of Azure Machine Learning GitHub Actions

Known issues

Error: MissingSubscriptionRegistration

What is MLOps?

Contributing

ml-template-azure's People

Contributors

Stargazers

Watchers

Forkers

ml-template-azure's Issues

For example - python step:

def main(): print('AMAR Inside STEP 1 Choose Data Test')

abbreviated python code for pipeline build for submission - this is the script that aml-run action calls to build pipeline.

Individual step output

BUT, action output/report

From the logs:

Recommend Projects

Recommend Topics

Recommend Org

def main():
print('AMAR Inside STEP 1 Choose Data Test')