Machine Learning in Action

Author: Pedro Paez github: https://github.com/pedrojpaez/dataweek.git

In this lab we will be going through the entire Data Science workflow using Sagemaker. The objective of this exercise is to build from scratch a Data Science project and to learn how Sagemaker helps accelerate the process of building and deploying in production custom machine learning models. We will see how to leverage Sagemaker's first party algorithms as well as the high level SDF for Deep Learning frameworks.

We will be building and end-to-end Natural Language Processing pipeline to classify newspaper headlines into general categories. We will first build word embeddings (vector representations of the english vocabulary) to enrich our model.

Prerequisites:

For this lab you will need to have:

A laptop
Network connectivity
An AWS account
Basic Python scripting experience
Basic knowledge of Data Science workflow

Preferred knowledge:

Basic knowledge of containers
Basic knowledge of deep learning

Part 1 : Prepare environment and create new Sagemaker project

Go to AWS Console in your account
On the top right corner select region N.Virginia

Search and click on Amazon Sagemaker
Under Notebook > Select Notebook Instance > and click on "Create Notebook Instance" button (orange button)

Give your project a name under "Notebook instance name"
Select ml.t2.medium Notebook instance type

Under "Permissions and encryption" > Under IAM role > select "Create a new role" in the scroll down menu
Select "Any S3 bucket" > Click on "Create new role" button

Finally "Create Notebook Instance" and wait until status is "InService"

Clone git repo with workshop material

Select "Open Jupyter". You should see a Jupyter notebook web interface.
Select "New" in the top right corner > Click on "Terminal". A new tab will open with access to the Shell.

You now have shell access to the notebook instance and full control/flexibility over your environment. We will cd (change directory to the Sagemaker home directory). Type from the root directory : cd Sagemaker
We will clone the material for this lab from the git repo : https://github.com/pedrojpaez/dataweek.git

git clone https://github.com/pedrojpaez/dataweek.git
Return to previous tab (Jupyter notebook web interface). The dataweek directory should now be available.

dataweek directory

There are 4 elements in the dataweek directory:

tf-src: This directory contains the MXNet training script for our document classifier.
blazingtext_word2vec_text8.ipynb: Notebook to create word embeddings using the Sagemaker first party algorithm Blazingtext. We will use these embeddings as input for our headline classifier to enrich the model.
headline-classifier-local.ipynb: Notebook to create headline classifier using keras (with MXNet backend) on the local instance.
headline-classifier-mxnet.ipynb: Notebook to create headline classifier leveraging Sagemaker training and deploying features. We will use MXNet high-level SDK to bring our MXNet code and run and deploy our model.

Run blazingtext_word2vec_text8.ipynb notebook

In this notebook we will run through the snippets of code. We will be building a word embedding model (vector representations of the english vocabulary) to use as input for our document classification model.

For this notebook we will use the first party algorithm Blazingtext to build our word embeddings and we will leverage the one-click training/one-click deployment capabilities of Sagemaker.

The general actions we will be running:

Configure notebook
Download text8 corpus file
Upload data to S3
Run training job on Sagemaker
Deploy model
Download model object and unpack wordvectors
Clean up (delete model endpoint)

Run through the notebook and read the instructions.

Run headline-classifier-local.ipynb notebook

In this notebook we will run through the snippets of code. We will build a headline classifier model that will classify newspaper headlines into 4 classes. We will build a deep learning model using the Keras interface with MXNet backend (and use the word embeddings we previously built as input to our model). We will run the training on locally (on the notebook instance) to evaluate performance.

The general actions we will be running:

Configure notebook
Download NewsAggregator datasets
Upload data to S3
Run training job locally
Move to the next notebook.

Run through the notebook and read the instructions.

Run headline-classifier-mxnet.ipynb notebook

In this notebook we will run through the snippets of code. We will build a headline classifier model that will classify newspaper headlines into 4 classes. We will build a deep learning model using the Keras interface with MXNet backend (and use the word embeddings we previously built as input to our model). We will run the training on Sagemaker and package the MXNet code to a training script and we will evaluate performance. Finally we will deploy our model as a RESTful API.

The general actions we will be running:

Configure notebook
Upload data to S3
Run training job on Sagemaker
Deploy model on Sagemaker
Clean up (delete model endpoint)

Run through the notebook and read the instructions.

Things to try at home

-Invoke a model endpoint deployed by Amazon SageMaker using API Gateway and AWS Lambda for additional functionality. https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

-Analyze the results of your model responses to real time data (for this switch the Comprehend API for your Sagemaker endpoint API). https://aws.amazon.com/blogs/machine-learning/build-a-social-media-dashboard-using-machine-learning-and-bi-services/

pedrojpaez / dataweek Goto Github PK

dataweek's Introduction

Machine Learning in Action

Prerequisites:

Part 1 : Prepare environment and create new Sagemaker project

Clone git repo with workshop material

dataweek directory

Run blazingtext_word2vec_text8.ipynb notebook

Run headline-classifier-local.ipynb notebook

Run headline-classifier-mxnet.ipynb notebook

Things to try at home

dataweek's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent