BQML Toolkit

The goal of this toolkit is to jumpstart the process of creating propensity models in BQML. The toolkit provides templated SQL queries to generate feature sets and models.

While there are pre-built models natively available in GA360, creating a custom model allows more freedom for customization, or predicting specific events or behaviors that are not captured by native ML models. This provides a starter template for building your own predictive model utilizing the BQ Export.

This is not meant to be an exhaustive solution or cover every use case. No two models are the same, so you should tweak with your own timelines and business case in mind.

For more on BQML, see the documentation.

Components

Queries

The sql folder contains sample scripts, some of which are utilized in create_and_run_queries.py. Most importantly, create_daily_feature_set.sql creates generalizable features at the user-level, which are then used for predicting future behavior.

Python script

The Python script included will automate the process of reading in the base feature set and model training SQL files, passing in the necessary parameters, running the queries in BigQuery, and saving the results.

Installation

Pull in GA360 data

This repo solely utilizes data generated by the GA360 BigQuery Export. If you are GA360 customer but do not have the export implemented, see this page for instructions on setting it up. Alternatively, you can use publicly available data by following the directions here.

Modify the queries (optional)

Because this code was made to work with any GA360 implementation, there is no site-specific behavior captured; similarly, this relies on eCommerce transactions as the behavior of interest (also known as our label or the behavior we are trying to predict), which may not be the ideal use case for all marketers. Becuase of this, it's recommended that you review the label to ensure it reflects the behavior you are trying to predict.

Here are some suggested additions that can be made to the feature set:

Custom dimensions and metrics
Hit-level data (what pages were visited?)
Events, if captured correctly

BQML also allows for an increasing number of ML models by simply using SQL-like syntax. It is suggested that for models used in production, the user adjust BQML parameters, and/or test different BQML models to optimize performance. It is common that there are severely unbalanced classes for these use cases, so it is also recommended to incorporate class weights

Running the Python script

If you want to run on Cloud Shell, click the button below to deploy. You will need to run gcloud init login to make sure your credentials are up to date.

If you want to run in your own environment, you will need to set up authentication. Follow the steps outlined here to create a service account and save it in your environment. Then you will need to add these credentials as an environment variable.

To run, you need a few pieces of information:

project_id: the ID of your GCP project
ga_dataset: the BigQuery dataset where you have stored your GA export data

Then, run this command in Cloud Shell: python3 create_and_run_queries.py [project_id] [ga_dataset]. Note that this will default to saving the feature set and model in the same dataset as your GA360 BQ Export.

What next?

After training a model, a common next step is pushing user scores back into GA360, where you can create custom audiences to target. For easy implementation, check out Project Modem to create an automated pipeline for activating the data from your model.

google / ga360-bqml-toolkit Goto Github PK

ga360-bqml-toolkit's Introduction

BQML Toolkit

Components

Queries

Python script

Installation

Pull in GA360 data

Modify the queries (optional)

Running the Python script

What next?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent