In this workshop you will train a machine learning model based on a sample use case using structured data. This workshop is intended for novice machine learning practitioners with no prior ML experience!
We will start with data preparation using Pandas and then train an XGBoost classification model on our notebook instance. Finally we will learn how to take our first step to productionizing this model utilizing Amazon SageMaker Training jobs and endpoints.
The first part of this workshop requires a running Jupyter notebook environment, for Lab 3 you will require an AWS account and a JupyterLab environment like SageMaker Studio.
If you are at an AWS event, follow this link and type in the event hash to get access to an AWS account:
To get started clone the repository and open 01-Lab-Data-Prep-with-Pandas.ipynb in Jupyter. Make sure you have the latest version of Pandas installed.
-
Open AWS console
-
Select Studio on the left
-
Select Launch Studio
-
Select Launch App --> Studio
-
Open System Terminal
-
Clone the repository
git clone https://github.com/johanneslanger/ml-immersion-day
-
Then open following notebook using the filebrowser on the left:
ml-immersion-day/01-Lab-Data-Prep-with-Pandas.ipynb
-
When asked to "Set up notebook environment" make sure to select
Data Science 2.0
image and hitSelect
: Note: loading the notebook and Kernel can take a couple of seconds.