Git Product home page Git Product logo

ashleshk / practical-data-science-on-the-aws-cloud-specialization Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 15.0 7.56 MB

@DeepLearning.AI Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It has helped me to develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker.

Jupyter Notebook 82.75% Python 4.89% HTML 12.37%
amazon-sagemaker amazon-sagemaker-lab aws-cloudwatch aws-s3 ml mlops sagemaker aws-hyperparameter

practical-data-science-on-the-aws-cloud-specialization's Introduction

Practical-Data-Science-on-the-AWS-Cloud-Specialization

  • learning from @DeepLearning.AI

pathway of this Course

About this Specialization

  • The Practical Data Science Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker.
  • This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages who want to learn how to build, train, and deploy scalable, end-to-end ML pipelines - both automated and human-in-the-loop - in the AWS cloud.

There are 3 courses

1. Analyze Datasets and Train ML Models using AutoML

  • In the first course of the Practical Data Science Specialization, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier.
  • You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.
  • Structure
    • Week 1: Explore the Use Case and Analyze the Dataset
      • Ingest, explore, and visualize a product review data set for multi-class text classification
    • Week 2: Data Bias and Feature Importance
      • Determine the most important features in a data set and detect statistical biases.
    • Week 3: Use Automated Machine Learning to train a Text Classifier
      • Inspect and compare models generated with automated machine learning (AutoML).
    • Week 4: Built-in algorithms
      • Train a text classifier with BlazingText and deploy the classifier as a real-time inference endpoint to serve predictions.

2. Build, Train, and Deploy ML Pipelines using BERT

  • In the second course of the Practical Data Science Specialization, you will learn to automate a natural language processing task by building an end-to-end machine learning pipeline using Hugging Face’s highly-optimized implementation of the state-of-the-art BERT algorithm with Amazon SageMaker Pipelines. Your pipeline will first transform the dataset into BERT-readable features and store the features in the Amazon SageMaker Feature Store. It will then fine-tune a text classification model to the dataset using a Hugging Face pre-trained model, which has learned to understand the human language from millions of Wikipedia documents. Finally, your pipeline will evaluate the model’s accuracy and only deploy the model if the accuracy exceeds a given threshold.
  • Practical data science is geared towards handling massive datasets that do not fit in your local hardware and could originate from multiple sources. One of the biggest benefits of developing and running data science projects in the cloud is the agility and elasticity that the cloud offers to scale up and out at a minimum cost.
  • Structure
    • Week 1: Feature Engineering and Feature Store
      • Transform a raw text dataset into machine learning features and store features in a feature store.
    • Week 2: Train, Debug, and Profile a Machine Learning Model
      • Fine-tune, debug, and profile a pre-trained BERT model.
    • Week 3: Deploy End-To-End Machine Learning pipelines
      • Orchestrate ML workflows and track model lineage and artifacts in an end-to-end machine learning pipeline.

3. Optimize ML Models and Deploy Human-in-the-Loop Pipelines

  • In the third course of the Practical Data Science Specialization, you will learn a series of performance-improvement and cost-reduction techniques to automatically tune model accuracy, compare prediction performance, and generate new training data with human intelligence. After tuning your text classifier using Amazon SageMaker Hyper-parameter Tuning (HPT), you will deploy two model candidates into an A/B test to compare their real-time prediction performance and automatically scale the winning model using Amazon SageMaker Hosting.
  • Lastly, you will set up a human-in-the-loop pipeline to fix misclassified predictions and generate new training data using Amazon Augmented AI and Amazon SageMaker Ground Truth.
  • Structure
    • Week 1: Advanced model training, tuning and evaluation
      • Train, tune, and evaluate models using data-parallel and model-parallel strategies and automatic model tuning.
    • Week 2: Advanced model deployment and monitoring
      • Deploy models with A/B testing, monitor model performance, and detect drift from baseline metrics.
    • Week 3: Data labeling and human-in-the-loop pipelines
      • Label data at scale using private human workforces and build human-in-the-loop pipelines.

practical-data-science-on-the-aws-cloud-specialization's People

Contributors

ashleshk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.