Git Product home page Git Product logo

ga360-bqml-toolkit's Introduction

BQML Toolkit

The goal of this toolkit is to jumpstart the process of creating propensity models in BQML. The toolkit provides templated SQL queries to generate feature sets and models.

While there are pre-built models natively available in GA360, creating a custom model allows more freedom for customization, or predicting specific events or behaviors that are not captured by native ML models. This provides a starter template for building your own predictive model utilizing the BQ Export.

This is not meant to be an exhaustive solution or cover every use case. No two models are the same, so you should tweak with your own timelines and business case in mind.

For more on BQML, see the documentation.

Components

Queries

The sql folder contains sample scripts, some of which are utilized in create_and_run_queries.py. Most importantly, create_daily_feature_set.sql creates generalizable features at the user-level, which are then used for predicting future behavior.

Python script

The Python script included will automate the process of reading in the base feature set and model training SQL files, passing in the necessary parameters, running the queries in BigQuery, and saving the results.

Installation

Pull in GA360 data

This repo solely utilizes data generated by the GA360 BigQuery Export. If you are GA360 customer but do not have the export implemented, see this page for instructions on setting it up. Alternatively, you can use publicly available data by following the directions here.

Modify the queries (optional)

Because this code was made to work with any GA360 implementation, there is no site-specific behavior captured; similarly, this relies on eCommerce transactions as the behavior of interest (also known as our label or the behavior we are trying to predict), which may not be the ideal use case for all marketers. Becuase of this, it's recommended that you review the label to ensure it reflects the behavior you are trying to predict.

Here are some suggested additions that can be made to the feature set:

  • Custom dimensions and metrics
  • Hit-level data (what pages were visited?)
  • Events, if captured correctly

BQML also allows for an increasing number of ML models by simply using SQL-like syntax. It is suggested that for models used in production, the user adjust BQML parameters, and/or test different BQML models to optimize performance. It is common that there are severely unbalanced classes for these use cases, so it is also recommended to incorporate class weights

Running the Python script

If you want to run on Cloud Shell, click the button below to deploy. You will need to run gcloud init login to make sure your credentials are up to date. Run on Google Cloud

If you want to run in your own environment, you will need to set up authentication. Follow the steps outlined here to create a service account and save it in your environment. Then you will need to add these credentials as an environment variable.

To run, you need a few pieces of information:

  • project_id: the ID of your GCP project
  • ga_dataset: the BigQuery dataset where you have stored your GA export data

Then, run this command in Cloud Shell: python3 create_and_run_queries.py [project_id] [ga_dataset]. Note that this will default to saving the feature set and model in the same dataset as your GA360 BQ Export.

What next?

After training a model, a common next step is pushing user scores back into GA360, where you can create custom audiences to target. For easy implementation, check out Project Modem to create an automated pipeline for activating the data from your model.

ga360-bqml-toolkit's People

Contributors

samchug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.