The goal of this toolkit is to jumpstart the process of creating propensity models in BQML. The toolkit provides templated SQL queries to generate feature sets and models.
While there are pre-built models natively available in GA360, creating a custom model allows more freedom for customization, or predicting specific events or behaviors that are not captured by native ML models. This provides a starter template for building your own predictive model utilizing the BQ Export.
This is not meant to be an exhaustive solution or cover every use case. No two models are the same, so you should tweak with your own timelines and business case in mind.
For more on BQML, see the documentation.
The sql
folder contains sample scripts, some of which are utilized in create_and_run_queries.py
.
Most importantly, create_daily_feature_set.sql
creates generalizable features at the user-level,
which are then used for predicting future behavior.
The Python script included will automate the process of reading in the base feature set and model training SQL files, passing in the necessary parameters, running the queries in BigQuery, and saving the results.
This repo solely utilizes data generated by the GA360 BigQuery Export. If you are GA360 customer but do not have the export implemented, see this page for instructions on setting it up. Alternatively, you can use publicly available data by following the directions here.
Because this code was made to work with any GA360 implementation, there is no site-specific behavior captured; similarly, this relies on eCommerce transactions as the behavior of interest (also known as our label or the behavior we are trying to predict), which may not be the ideal use case for all marketers. Becuase of this, it's recommended that you review the label to ensure it reflects the behavior you are trying to predict.
Here are some suggested additions that can be made to the feature set:
- Custom dimensions and metrics
- Hit-level data (what pages were visited?)
- Events, if captured correctly
BQML also allows for an increasing number of ML models by simply using SQL-like syntax. It is suggested that for models used in production, the user adjust BQML parameters, and/or test different BQML models to optimize performance. It is common that there are severely unbalanced classes for these use cases, so it is also recommended to incorporate class weights
If you want to run on Cloud Shell, click the button below to deploy. You will
need to run gcloud init login
to make sure your credentials are up to date.
If you want to run in your own environment, you will need to set up authentication. Follow the steps outlined here to create a service account and save it in your environment. Then you will need to add these credentials as an environment variable.
To run, you need a few pieces of information:
project_id
: the ID of your GCP projectga_dataset
: the BigQuery dataset where you have stored your GA export data
Then, run this command in Cloud Shell: python3 create_and_run_queries.py [project_id] [ga_dataset]
. Note that this will default to saving the feature set and model in
the same dataset as your GA360 BQ Export.
After training a model, a common next step is pushing user scores back into GA360, where you can create custom audiences to target. For easy implementation, check out Project Modem to create an automated pipeline for activating the data from your model.