Git Product home page Git Product logo

blockbuster's Introduction

Blockbuster

WARNING: This project is no longer under active development and further improvements are by customers and their partners.

Initial Setup

Enable APIs

Enable the following APIs & Services in your GCP Project:

You can ignore the prompt to create credentials for each API.

Environment Setup

  • Confirm the gcloud configuration to confirm project_id and region.

    gcloud config list

    If the intended project is not set, then

    gcloud config set project <project name>
    gcloud auth login
  • Edit setup/set_env.sh and configure with desired properties.

  • Setup required environment variables in your console session:

    source setup/set_env.sh
  • Setup "Private IP Google Access" on the network in the region to be used. This allows Dataflow to spin up CPUs without assigning external IPs.

    gcloud compute networks subnets update default \
        --region ${GCP_REGION} \
        --enable-private-ip-google-access
  • Create the GCS Bucket and BQ Dataset:

    gsutil ls -L "gs://${GCP_BUCKET}" 2>/dev/null \
        || gsutil mb -c regional -l "${GCP_REGION}" "gs://${GCP_BUCKET}"
    bq --location=${BQ_LOCATION} mk --dataset \
        --description "${THIS_PROJECT} working dataset." \
        ${GCP_PROJECT_ID}:${GCP_BQ_WORKING_DATASET}
  • Create a Cloud Composer environment (This step will take ~40min):

    gcloud composer environments create $GCP_COMPOSER_ENV_NAME \
        --location=$GCP_REGION \
        --disk-size=20 \
        --python-version=3 \
        --image-version="composer-1.16.8-airflow-1.10.15"
  • Update the installed dependencies with requirements.txt (This step will take ~45min):

      gcloud composer environments update $GCP_COMPOSER_ENV_NAME \
        --location=$GCP_REGION \
        --update-pypi-packages-from-file="requirements.txt"

Grant service account permissions

  • Create an Automl Service Account:

      gcloud iam service-accounts create service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com \
          --description="Auto ML service" \
          --display-name="Auto ML service"
    
  • Grant Owner permissions to the default compute service account:

      gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
        --member="serviceAccount:${GCP_PROJECT_NUMBER}[email protected]" \
        --role='roles/owner'
  • Grant Data Viewer permissions to the AutoML Tables service account:

      gcloud projects add-iam-policy-binding $DATA_STORAGE_PROJECT \
        --member="serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com" \
        --role='roles/bigquery.dataEditor'
    
      gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \
        --member="serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-automl.iam.gserviceaccount.com" \
        --role="roles/automl.serviceAgent"
    

Setup Cloud Composer Variables

  • If kubectl is not installed run the following command :

    sudo-apt get install kubectl
  • Copy the generated variables.json file to the environment and import it:

    gcloud composer environments storage data import \
      --environment=$GCP_COMPOSER_ENV_NAME \
      --location=$GCP_REGION \
      --source="setup/variables.json"
    gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      variables  -- --i /home/airflow/gcs/data/variables.json
  • Upload the features to the composer environment(modify the features.json file, if needed):

    gcloud composer environments storage data import \
      --environment=$GCP_COMPOSER_ENV_NAME \
      --location=$GCP_REGION \
      --source="setup/features.json"
    gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      variables -- --i /home/airflow/gcs/data/features.json

Generate ML Windowing Pipeline Templates:

  • In a working directory on your local machine, clone the "Cloud for Marketing" git repository and initiate the build (This step will take ~5 min):

      git clone https://github.com/GoogleCloudPlatform/cloud-for-marketing.git
      cd cloud-for-marketing/marketing-analytics/predicting/ml-data-windowing-pipeline/ && \
      gcloud builds submit \
        --config=cloud_build.json \
        --substitutions=_BUCKET_NAME=${GCP_BUCKET} && \
      cd ../../../../

Update DAGs

  • Get the GCS location associated with your Composer instance:

    export BB_DAG_BUCKET=$(
      gcloud composer environments describe $GCP_COMPOSER_ENV_NAME \
      --location $GCP_REGION \
      --format "value(config.dagGcsPrefix)")
  • Copy the contents of the dags folder to that location:

    gsutil -m cp -r ./dags/* ${BB_DAG_BUCKET}
  • Verify the DAGS loaded correctly. Run the following:

      gcloud composer environments run $GCP_COMPOSER_ENV_NAME \
        --location $GCP_REGION \
        list_dags

    If you completed all above steps correctly, the result will include:

        -------------------------------------------------------------------
        DAGS
        -------------------------------------------------------------------
        0_BB_Prepare_Source
        0_BB_Prepare_Source.prepare_source_data
        1_BB_Analysis
        1_BB_Analysis.analyze
        2_BB_Preprocess
        2_BB_Preprocess.preprocess
        3_BB_Data_load_and_train
        3_BB_Data_load_and_train.load_data
        3_BB_Data_load_and_train.train_model
        4_BB_Predict_and_activate
        4_BB_Predict_and_activate.activate_ga
        4_BB_Predict_and_activate.analyze
        4_BB_Predict_and_activate.batch_predict
        4_BB_Predict_and_activate.cleanup_gcs
        4_BB_Predict_and_activate.prepare_source_data
        4_BB_Predict_and_activate.preprocess
  • Open the Airflow UX. You can find the URL in the Cloud Composer page in GCP, or run the following to get the URL (Note it may take 1-2 minutes for the UX to show the new DAGs):

      gcloud composer environments describe $GCP_COMPOSER_ENV_NAME \
        --location $GCP_REGION \
        --format "value(config.airflowUri)"

blockbuster's People

Contributors

gunjankathuria avatar vermavineet-google avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.