Git Product home page Git Product logo

dataweek's Introduction

Machine Learning in Action

Author: Pedro Paez github: https://github.com/pedrojpaez/dataweek.git

In this lab we will be going through the entire Data Science workflow using Sagemaker. The objective of this exercise is to build from scratch a Data Science project and to learn how Sagemaker helps accelerate the process of building and deploying in production custom machine learning models. We will see how to leverage Sagemaker's first party algorithms as well as the high level SDF for Deep Learning frameworks.

We will be building and end-to-end Natural Language Processing pipeline to classify newspaper headlines into general categories. We will first build word embeddings (vector representations of the english vocabulary) to enrich our model.

Prerequisites:

For this lab you will need to have:

  • A laptop
  • Network connectivity
  • An AWS account
  • Basic Python scripting experience
  • Basic knowledge of Data Science workflow

Preferred knowledge:

  • Basic knowledge of containers
  • Basic knowledge of deep learning

Part 1 : Prepare environment and create new Sagemaker project

  1. Go to AWS Console in your account
  2. On the top right corner select region N.Virginia

enter image description here

  1. Search and click on Amazon Sagemaker
  2. Under Notebook > Select Notebook Instance > and click on "Create Notebook Instance" button (orange button)

enter image description here

  1. Give your project a name under "Notebook instance name"
  2. Select ml.t2.medium Notebook instance type

enter image description here

  1. Under "Permissions and encryption" > Under IAM role > select "Create a new role" in the scroll down menu enter image description here

  2. Select "Any S3 bucket" > Click on "Create new role" button

enter image description here

  1. Finally "Create Notebook Instance" and wait until status is "InService"

enter image description here

Clone git repo with workshop material

  1. Select "Open Jupyter". You should see a Jupyter notebook web interface.
  2. Select "New" in the top right corner > Click on "Terminal". A new tab will open with access to the Shell.

enter image description here

  1. You now have shell access to the notebook instance and full control/flexibility over your environment. We will cd (change directory to the Sagemaker home directory). Type from the root directory : cd Sagemaker

  2. We will clone the material for this lab from the git repo : https://github.com/pedrojpaez/dataweek.git

    git clone https://github.com/pedrojpaez/dataweek.git

    enter image description here

  3. Return to previous tab (Jupyter notebook web interface). The dataweek directory should now be available.

enter image description here

dataweek directory

There are 4 elements in the dataweek directory:

  • tf-src: This directory contains the MXNet training script for our document classifier.
  • blazingtext_word2vec_text8.ipynb: Notebook to create word embeddings using the Sagemaker first party algorithm Blazingtext. We will use these embeddings as input for our headline classifier to enrich the model.
  • headline-classifier-local.ipynb: Notebook to create headline classifier using keras (with MXNet backend) on the local instance.
  • headline-classifier-mxnet.ipynb: Notebook to create headline classifier leveraging Sagemaker training and deploying features. We will use MXNet high-level SDK to bring our MXNet code and run and deploy our model.

enter image description here

Run blazingtext_word2vec_text8.ipynb notebook

In this notebook we will run through the snippets of code. We will be building a word embedding model (vector representations of the english vocabulary) to use as input for our document classification model.

For this notebook we will use the first party algorithm Blazingtext to build our word embeddings and we will leverage the one-click training/one-click deployment capabilities of Sagemaker.

The general actions we will be running:

  1. Configure notebook
  2. Download text8 corpus file
  3. Upload data to S3
  4. Run training job on Sagemaker
  5. Deploy model
  6. Download model object and unpack wordvectors
  7. Clean up (delete model endpoint)

Run through the notebook and read the instructions.

Run headline-classifier-local.ipynb notebook

In this notebook we will run through the snippets of code. We will build a headline classifier model that will classify newspaper headlines into 4 classes. We will build a deep learning model using the Keras interface with MXNet backend (and use the word embeddings we previously built as input to our model). We will run the training on locally (on the notebook instance) to evaluate performance.

The general actions we will be running:

  1. Configure notebook
  2. Download NewsAggregator datasets
  3. Upload data to S3
  4. Run training job locally
  5. Move to the next notebook.

Run through the notebook and read the instructions.

Run headline-classifier-mxnet.ipynb notebook

In this notebook we will run through the snippets of code. We will build a headline classifier model that will classify newspaper headlines into 4 classes. We will build a deep learning model using the Keras interface with MXNet backend (and use the word embeddings we previously built as input to our model). We will run the training on Sagemaker and package the MXNet code to a training script and we will evaluate performance. Finally we will deploy our model as a RESTful API.

The general actions we will be running:

  1. Configure notebook
  2. Upload data to S3
  3. Run training job on Sagemaker
  4. Deploy model on Sagemaker
  5. Clean up (delete model endpoint)

Run through the notebook and read the instructions.

Things to try at home

-Invoke a model endpoint deployed by Amazon SageMaker using API Gateway and AWS Lambda for additional functionality. https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

-Analyze the results of your model responses to real time data (for this switch the Comprehend API for your Sagemaker endpoint API). https://aws.amazon.com/blogs/machine-learning/build-a-social-media-dashboard-using-machine-learning-and-bi-services/

dataweek's People

Contributors

pedrojpaez avatar

Stargazers

Nazmi Asri avatar

Watchers

James Cloos avatar

Forkers

dakmatt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.