This POC has been developed to display possibility to train and deploy models using Release solution where analytics stakeholders can use use config driven architecture to train and deploy models with minimal knowledge of machine learning.
Solution consists of three shell scripts in deployment directory:
build_docker.sh
- will build a local docker image that will be used for later stages. It accepts single argumentcontainer_name
. Docker image can be tested by running the following commands:
train_local.sh
: Run this with the name of the image and it will run training on the local tree. For example, you can run$ ./train_local.sh sagemaker-decision-trees
. It will generate a model under the/test_dir/model
directory. You'll want to modify the directorytest_dir/input/data/...
to be set up with the correct channels and data for your algorithm. Also, you'll want to modify the fileinput/config/hyperparameters.json
to have the hyperparameter settings that you want to test (as strings).serve_local.sh
: Run this with the name of the image once you've trained the model and it should serve the model. For example, you can run$ ./serve_local.sh sagemaker-decision-trees
. It will run and wait for requests. Simply use the keyboard interrupt to stop it.,predict.sh
: Run this with the name of a payload file and (optionally) the HTTP content type you want. The content type will default totext/csv
. For example, you can run$ ./predict.sh payload.csv text/csv
.
-
build_deploy_sagemaker.sh - will build and push docker image to Amazon ECR and will start a model training job. It accepts two arguments -
container_name
,data_dir
- location of train data that is to be used for training job andout_file
- location of training logs file. Once the training job is completed it will create a training logs file that contains stdout of training job that includes model metrics and name of training job. -
run_batch_prediction_demo.sh - will run a batch prediction job using the model that was trained in the previous step (will work only if the model was trained) and will store predictions in S3.
-
cd deployment
-
./build_docker.sh release-demo
1
cd local_test
2
./train_local release-demo
- will train a model in local docker3
./serve_local release-demo
- will start a REST server in localhost:80804 run
./predict.sh test_dir/input/data/test/test.csv text/csv
- will predict CSV payload5 run
./predict.sh test_dir/input/data/test/test.json application/json
- will predict JSON payload -
./build_deploy_sagemaker.sh release-demo data training_output.txt
- will deploy Docker image to ECR and initiate training job. Output will be written totraining_output.txt
file. Output will contain model training metrics and name of the training job. -
cat training_output.txt
- will print metrics of the model training -
./run_batch_prediction_demo.sh output-for-release-demo data training_output.txt
- starts batch prediction job using dataset stored in data directory -
aws s3 sync s3://sagemaker-us-west-2-552551502186/output-for-release-demo-out predictions
- will download predictions of batch prediction job
-
Prerequisites
- Docker
- AWS command line tools
- Python3.7 (catboost package requires python>=3.6)
-
Install pip packages from requirements.txt file
- BatchTransform jobs has limitation of file size. Single file cannot be larger than 6MB
- Deploying model as SageMaker endpoints will not you allow to use curl/wget without AWS authentication header. Read more here
- At this point model training parameters are part of docker image but could be used as model training hyperparameters. However, there's limitation though of how Sagemaker passes hyperparameters to docker (only top level dictionary items are parsed correctly)
- Most of the training job configs (i.e. S3 bucket for hosting models, training data and predictions, EC2 instance configs) are hardcoded