Git Product home page Git Product logo

batch-scoring-deep-learning-models-with-aks's Introduction

Batch Scoring Deep Learning Models With Kubernetes

Overview

In this repository, we use the scenario of applying style transfer onto a video (collection of images). This architecture can be generalized for any batch scoring with deep learning scenario. For an alternative solution using Azure Machine Learning service, we suggest seeing the solution available here which is also described on the Azure Reference Architecture center.

Design

Reference Architecture Diagram

The above architecture works as follows:

  1. Upload a video file to storage.
  2. The video file will trigger Logic App to send a request to the flask endpoint hosted on one of the nodes of the AKS cluster.
  3. That node will first preprocess the video file by splitting the video into individual images and extracting the audio file.
  4. That node will then add all images to the Service Bus queue.
  5. The other nodes in the AKS cluster are continuously polling the Service Bus queue - as soon as any images are in the queue, it will pull it off the queue and apply style transfer to the image.
  6. When all frames have been processed, the images will be stitched back together into a video with the audio file.

What is Neural Style Transfer

Style image: Input/content video: Output video:
click to view video click to view

Prerequsites

Local/Working Machine:

Accounts:

While it is not required, it is also useful to use the Azure Storage Explorer to inspect your storage account.e az cli installed and logged into

Setup

  1. Clone the repo git clone https://github.com/Azure/Batch-Scoring-Deep-Learning-Models-With-AKS
  2. cd into the repo
  3. Setup your conda env using the environment.yml file conda env create -f environment.yml - this will create a conda environment called batchscoringdl
  4. Activate your environment source activate batchscoringdl
  5. Log in to Azure using the az cli az login
  6. Log in to Docker using the docker cli docker login

Steps

Run throught the following notebooks:

  1. Test the Style Transfer Script
  2. Setup Azure - Resource group, Storage, Service Bus.
  3. Test the model locally
  4. Create the AKS cluster
  5. Run style transfer on the cluster
  6. Deploy Logic Apps
  7. Clean up

Clean up

To clean up your working directory, you can run the clean_up.sh script that comes with this repo. This will remove all temporary directories that were generated as well as any configuration (such as Dockerfiles) that were created during the tutorials. This script will not remove the .env file.

To clean up your Azure resources, you can simply delete the resource group that all your resources were deployed into. This can be done in the az cli using the command az group delete --name <name-of-your-resource-group>, or in the portal. If you want to keep certain resources, you can also use the az cli or the Azure portal to cherry pick the ones you want to deprovision. Finally, you should also delete the service principle using the az ad sp delete command.

All the step above are covered in the final notebook.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

batch-scoring-deep-learning-models-with-aks's People

Contributors

danielleodean avatar jiata avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

batch-scoring-deep-learning-models-with-aks's Issues

Blobfuse section in 01_setup noteook failed/malformed .env file

The failure specifically occurred when trying to mount blob to the mount directory: Failed to connect to the storage container. There might be something wrong about the storage config, please double check the storage account name, account key and container name. errno = 400. After working with JS Tan, the root cause seems to be a mal-formed .env file. Once the format of the .env file was corrected, the code worked.

Not sure how the .env file became mal-formed.

Service Principal permissions concern with 01_Setup notebook

Ran into in issue in the 01_setup notebook in the section for creating the Service Principal. Added display(credentials to the credentials cell code to find out why the subsequent cells were failing. The error indicates i don't have permissions to execute the RBAC operation. The code may be assuming users who run the notebook have administrator level rights to the tenant or the subscription - which i do not have and it's likely that anyone running this with an enterprise subscription would also not have.

Can this functionality be enabled without requiring elevated permissions for the person running the notebook?

Jupyter kernel fails to start on ubuntu DLVM

Provisioned a new ubuntu DLVM because it has GPU support, followed setup instructions under the Readme, and verified required versions. The batchscoringdl kernel fails to start (restarts don't help). Changing to a different kernel avoids the dead kernel but it won't execute the dotenv cells.
deadkernel

The problem seems to be a versioning problem with prompt_toolkit (snip from the jupyter execution window: …from prompt_toolkit.formatted_text import PygmentsTokens
ModuleNotFoundError: No module named ‘prompt_toolkit.formatted_text’

after down-versioning ipython to 6.5, the notebook executed. My concern is that during the down-versioning, a bunch of azure cli compatibility issues displayed so i'm not confident down-grading ipython is the right resolution - maybe just downgrading prompt_toolkit is the right solution...?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.