Git Product home page Git Product logo

merlin-on-vertex's Introduction

[WiP] NVIDIA Merlin on Vertex AI

This repository compiles prescriptive guidance and code samples for operationalization of NVIDIA Merlin framework on Google Cloud Vertex AI.

NVIDIA Merlin is and open-source framework for building large-scale deep learning recommender system. Vertex AI is Google Cloud's unified Machine Learning platform to help data scientists and machine learning engineers increase experimentation, deploy faster, and manage models with confidence.

The content in this repository centers around five core scenarios:

We assume that users of this repository have practical experience with both NVIDIA Merlin and Vertex AI. If you are unfamiliar with some of the core concepts or technologies used in this repo we recommend referring to NVIDIA Merlin and Vertex AI documentation.

The dataset used by all samples in this repo is Criteo 1TB Click Logs dataset provided by The Criteo AI Lab.

Architecture Overview

The below figure summarizes a high level architecture of the solution demonstrated in this repo.

NVIDIA Merlin

Large scale data preprocessing

Commercial recommenders are trained on huge datasets, often several hundreds of terabytes in size. At this scale, data preprocessing steps often take much more time than training recommender machine learning models. NVTabular - a core component of Merlin - is a feature engineering and preprocessing library designed to effectively manipulate terabytes of recommender system datasets and significantly reduce data preparation time.

In this repo we demonstrate how to operationalize NVTabular data preprocessing workflows using Vertex AI Pipelines and multi-GPU processing nodes. The repo includes two samples of reusable and customizable data preprocessing pipelines developed using Kubeflow Pipelines SDK and Google Cloud pipeline components.

The first pipeline demonstrates how to process large CSV datasets managed in Google Cloud Storage. The second pipeline digests source data from Google BigQuery

Training large-scale deep learning recommender models

NVIDIA HugeCTR is NVIDIA's GPU-accelerated, highly scalable recommender framework. NVIDIA HugeCTR facilitates implementations of leading deep learning recommender models such as Google's Wide and Deep and, Facebook's DLRM.

The repo includes an example of how to operationalize training and hypertuning of the HugeCTR implementation of the DeepFM model using Vertex AI Training and massively scalable A2 workers.

Deploying and serving deep learning ranking inference pipelines

NVIDIA Triton Inference Server is a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports ensemble models that can be used to implement multi-step inference pipelines.

The repo includes a sample that demonstrates how to create, deploy, and serve a Triton ensemble model using Vertex AI Prediction. The example ensemble implements an inference pipeline that integrates NVTabular data preprocessing workflow with a HugeCTR deep learning model.

End to end MLOps workflow

Design patterns and best practices outlined in the previous sections come together in the last component of the solution - a reference implementation the machine learning pipeline that integrates data preprocessing, training, and deployment into a unified end to end workflow.

Experimentation and development environment

A flexible and powerful experimentation and development environment is critical in recommender system projects. In the environment setup section of this repo we outline steps to configure the environment depicted on the below figure.

NVIDIA Merlin dev

The environment is based on Vertex AI Workbench. Container images based on NVIDIA NGC Merlin training and Merlin inference images are installed as managed notebooks kernels augmenting the standard features of a managed notebook instance that include UI and programmatic interfaces to Google Cloud services.

Repository structure

The core content of the repository comprises a series of Jupyter notebooks and a set of Python modules. The notebooks compile detailed guidance on implementing the solution's components described at a high level in the previous section. The Python modules encapsulate reusable code components that are used in the notebooks and in Vertex AI jobs and pipelines.

Currently, the repo includes the following notebooks:

  1. 00-dataset-management - describes and explores the Criteo dataset, and loads it to BigQuery.
  2. 01-dataset-preprocessing - guidance for large scaled data preprocessing with NVTabular and Vertex Pipelines
  3. 02-model-training-hugectr - guidance for training HugeCTR models with Vertex Training.
  4. 03-model-inference-hugectr - guidance for deploying Trition ensemble models with Vertex Prediction
  5. 04-e2e-pipeline - guidance for building an end-to-end data preprocessing, training, and deployment pipeline

The Python modules are in the src folder:

  • src/pipelines - pipeline and pipeline components definitions
  • src/preprocessing - data preprocessing utility functions and classes
  • src/serving - deployment and serving utility functions and classes
  • src/training - model definitions and training loops

The src folder also includes Dockerfiles for custom container images used by Vertex Pipelines, Vertex Training, and Vertex Prediction. Refer to the notebooks for more detailed information.

Environment setup

This section outlines the steps to configure a GCP environment required to run the code samples in this repo.

Select a Google Cloud project

In the Google CLoud Console, on the project selector page, select or create a Google Cloud project.

Enable the required services

From Cloud Shell, run the following gcloud command to enable the required Cloud APIs:

PROJECT_ID=<YOUR PROJECT ID>
gcloud services enable \
    aiplatform.googleapis.com         \
    bigquery.googleapis.com           \
    bigquerystorage.googleapis.com    \
    cloudapis.googleapis.com          \
    cloudbuild.googleapis.com         \
    compute.googleapis.com            \
    containerregistry.googleapis.com  \
    notebooks.googleapis.com          \
    storage.googleapis.com            \
    --project=${PROJECT_ID}

Creating a staging GCS bucket

The notebooks in the repo require access to a GCS bucket that is used for staging and managing ML artifacts created by the workflows implemented in the notebooks. The bucket should be in the same GCP region as the region you will use to run Vertex AI jobs and pipelines.

From Cloud Shell, run the following command to create the bucket:

REGION=<YOUR REGION>
BUCKET_NAME=<YOUR BUCKET_NAME>

gsutil mb -l $REGION gs://$BUCKET_NAME

Building Merlin development container image

To run the notebooks in this repo you will use a custom container image that will be configured as a Vertex AI Workbench managed notebooks kernel. The image is a based on the NVIDIA NGC Merlin training image augmented with additional packages required to interface with Vertex AI.

From Cloud Shell, run the following command to create the container image:

  1. Get Dockerfile for the Merlin development image:
SRC_REPO=https://github.com/GoogleCloudPlatform/merlin-on-vertex
LOCAL_DIR=merlin-env-setup
kpt pkg get $SRC_REPO/env@main $LOCAL_DIR
cd $LOCAL_DIR
  1. Build and push the development image
PROJECT_ID=merlin-on-vertex # change to your project id.
IMAGE_URI=gcr.io/${PROJECT_ID}/merlin-dev-vertex
gcloud builds submit --timeout "2h" --tag ${IMAGE_URI} . --machine-type=e2-highcpu-8

Creating and configuring an instance of Vertex Workbench managed notebook

This section provides step for provisioning a Vertex AI Workbench managed notebook and configuring a custom kernel based on the image created in the previous step.

  1. Follow the instructions in the Create a managed notebooks instance how-to guide:

Install the samples

After the Vertex Workbench managed notebook is created, peform the following steps:

  1. Click on the OPEN JUPYTERLAB link on the notebook instance.
  2. Click on the New Launcher button, then start a new terminal session.
  3. Clone the repository to your notebook instance:
    git clone https://github.com/GoogleCloudPlatform/nvidia-merlin-on-vertex-aigit
    cd nvidia-merlin-on-vertex-ai
    

merlin-on-vertex's People

Contributors

dimanghaz avatar jarokaz avatar ksalama avatar leiterenato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.