Vertex AI e2e pipeline with classification problem case using AutoML.
This project is based on the Google's demo which can be found in https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/pipelines/automl_tabular_classification_beans.ipynb
config
contains the component and pipeline configuration filescomponents
contains veterx component python filespipelines
contains veterx pipeline python filesutils
contains helper functions
-
Clone the repository by running
git clone https://github.com/Rasheed19/vertex-ai-automl-pipeline
-
Navigate to the root folder, i.e.,
vertex-ai-automl-pipeline
and create a python virtual environment by runningpython3.10 -m venv .venv
-
Activate the virtual environment by running
source .venv/bin/activate
-
Upgrade
pip
by runningpip install --upgrade pip
-
Install all the required Python libraries by running
pip install -r requirements.txt
-
Download the beans data from hhttps://archive.ics.uci.edu/dataset/602/dry+bean+dataset. Convert it to csv and upload it to the BigQuery
-
Create a file named
.env
in the root folder and store the following variables related to your GCP:PROJECT_ID=your-project-id REGION=your-project-region BUCKET_URI=gs://your-project-name SERVICE_ACCOUNT=your-service-account
-
Run the following commands in your terminal to configure the pipeline run on the Vertex AI (make sure
gcloud CLI
is installed on your computer):-
Login:
gcloud auth login
-
Configure the login to use your prefered project:
gcloud config set project your-prpject-id
-
Get and save your user account credentials:
gcloud auth application-default login
-
Grant access to the pipeline to use your storage bucket
gsutil iam ch serviceAccount:your-service-account:roles/storage.objectCreator gs://your-project-name
gsutil iam ch user:your-gmail-address:objectCreator gs://your-project-name
-
-
Then run the pipeline that trains, registers, and deploys a trained model to the Vertex AI endpoint by running one of the following customised commands in your terminal:
-
Run the pipeline with default options
python run.py
-
Run the pipeline with Quality Gate for test AU ROC set at 95% for test set. If the threshold fails, then the model won't be deployed to an endpoint.
python run.py --min-test-accuracy 0.95
-