Git Product home page Git Product logo

azure / mlops-project-template Goto Github PK

View Code? Open in Web Editor NEW
64.0 16.0 88.0 6.23 MB

Azure MLOps (v2) solution accelerators. Enterprise ready templates to deploy your machine learning models on the Azure Platform.

Home Page: https://learn.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment

License: MIT License

Python 71.43% Bicep 1.39% HCL 5.35% Dockerfile 0.87% Jupyter Notebook 20.96%
azure azuremachinelearning azureml deep-learning devops machine-learning microsoft mlops mlops-environment mlops-template

mlops-project-template's Introduction

mlops-project-template's People

Contributors

azeltov avatar chrey-gh avatar cindyweng avatar dependabot[bot] avatar djdean avatar jfomhover avatar jomedinagomez avatar kevball2 avatar lindacmsheard avatar maggiemhanna avatar manu-kanwarpal avatar mariamedp avatar michalmar avatar microsoft-github-operations[bot] avatar microsoftopensource avatar msmarti avatar murggu avatar nicoleserafino avatar samelhousseini avatar sbaidachni avatar sdonohoo avatar setuc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlops-project-template's Issues

The suffix tfstate/tf-state is used as is instead of it being a variable.

The names of the storage and the resource group end up as prodtfstate and then results in an error because the storage name now is greater than 24 characters. See the steps below to reproduce. The config yaml required lines are as below

  namespace: mlopsv2hez
  postfix: 0918
  location: westus
  environment: prod
  enable_aml_computecluster: true

Running the pipeline with the above configuration results in the following error. Mostly due to the name being 27 characters instead of it limiting to 24 characters.

ERROR: (AccountNameInvalid) stmlopsv2hez0918prodtfstate is not a valid storage account name. Storage account name must be between 3 and 24 characters in length and use numbers and lower-case letters only.

Code: AccountNameInvalid

Message: stmlopsv2hez0918prodtfstate is not a valid storage account name. Storage account name must be between 3 and 24 characters in length and use numbers and lower-case letters only.

##[error]Script failed with exit code: 1

Potential fixes:

  1. Use only the first 24 characters
  2. Warn the user that the storage length is greater than 24 characters.
  3. Remove the tf-state from the Terrafrom code. Check the lines below:
    terraform_st_resource_group: rg-$(namespace)-$(postfix)$(environment)-tf-state
    terraform_st_storage_account: st$(namespace)$(postfix)$(environment)tfstate

(classic) online managed endpoint deployment fails

(ado) pipeline /mlops/devops-pipelines/deploy-online-endpoint-pipeline.yml fails

here is already an endpoint with this name, Endpoint name needs to be unique within a region
endpoint name should maybe be set and uniquefied (follow the naming in main yaml files)

proposal:

AML

online_endpoint: oe-$(namespace)-$(postfix)$(environment)
batch_endpoint: be-$(namespace)-$(postfix)$(environment)

(classic) Batch online training pipeline

Batch on line training pipeline fails in step Test-Deployment. Error:
/home/vsts/.local/bin/az account set --subscription 96e7c4cd-67ef-4e02-b10b-4fd99b1a6b34
/usr/bin/bash /home/vsts/work/_temp/azureclitaskscript1654821379124.sh

Uploading taxi-batch.csv (< 1 MB): 0.00B [00:00, ?B/s]
Uploading taxi-batch.csv (< 1 MB): 100%|██████████| 134k/134k [00:00<00:00, 816kB/s]
Uploading taxi-batch.csv (< 1 MB): 100%|██████████| 134k/134k [00:00<00:00, 813kB/s]

ERROR: Met error <class 'Exception'>:upstream request timeout
Please check log in debug mode for more details.

fix error Storage Account Name already existing (bicep)

when running the bicep cicd pipeline we often encounter the errormessage , that the storage account name is already in use. This is because storage accounts need to be unique across Azure. Therefore we create a uniquefying function for the storage account name in bicep script

[mlops-project-template] TF: unnecessary forced replacement on re-run

Why?

some settings in the aml terraform module are forcing unnecessary replacement when re-running the infra pipeline

How?

I'll create a draft pull request that speeds up re-running of the infra pipeline

Anything else?

I'll have only tested this with the basic / non secure quickstart so far, that's why I'll leave it as a draft for your review.

CV Project Template is missing batch endpoint deployment

If you are following the quickstart with CV project template and do batch endpoint deployment the yml file references :

      - template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
        parameters:
          deployment_name: taxi-batch-dp
          deployment_file: mlops/azureml/deploy/batch/batch-deployment.yml  

image

However the deploy directory in templates for cv flavor only has realtime deployment. Need to either correct the quickstart manual to use real time deployment and/or add batch endpoint support.

image

[mlops-project-template] Filepaths in files need to coordinate with filepath changes made by sparse checkout in Azure/mlops-v2

Why?

Since this repo is designed to be used together with Azure/mlops-v2, files in this template that make use of relative paths need to coordinate with the way this repo is changed after sparse_checkout.sh is run.

How?

So that developers on this repo don't need to track the impacts of changes to a distinct repo (Azure/mlops-v2), it seems easier to develop and maintain relative paths in files in this repo if they are written relative to the folder structure of this repo as it is, rather than as it will be after a script from another repo, like sparse_checkout.sh, is run.

Anything else?

Tasks

No tasks being tracked yet.

Terraform Iac pipeline not working

after mlops-v2/sparse_checkout.sh and running the terraform pipeline infrastructure/pipelines/tf-ado-deploy-infra.yml error
/infrastructure/pipelines/tf-ado-deploy-infra.yml: Could not find /config-aml.yml in repository self hosted on https://github.com/ using commit 9b3932277362fc8df9597e6ffa966d821d05cae1. GitHub reported the error, "Not Found"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.