dbt-kube-demo

A demo of how to use K8s CronJobs to automate DBT pipelines.

You'll find here all you need to run a simple DBT project over Kubernetes with BiGQuery as a SQL backend.

Prerequisites

We use several tools in order to run this project.

❗ Mandatory tools

python and poetry in order to generate and upload some fake data and run DBT locally if you want.
gcloud CLI to interact with GCP. Installation guide here
kubectl to interact with you Kubernetes cluster. Installation with gcloud here
helm to install and manages Kubernetes resources. Installation guide here
terraform to deploy and manage the infrastructure. Installation guide here
docker to build and push the DBT image to GCP Artifact Registry. Installation guide here

ℹ️ Optional tools

k9s, a TUI to visualize, explore and interact with the cluster
kubens to set a default namespace when running kubectl commands
kubectx to handle multiple context with kubectl

Installation steps

GCP Project

The very first step is to setup a new GCP project. Once this is done, you can start deploying your infrastructure.

Infrastructure

We use Terraform to deploy and manage the infrastructure: all can be done without clicking on the GCP Console.

Follow the README in the terraform folder.

Export some fake data

There are 2 scripts in data folder that generates and upload to GCS some fake data you can use. You can have a look at the data it generates by inspect the 2 .ndjson files.

Make sure you have installed the project with poerty install and just run:

poetry run pyhton data/data_faker.py
poetry run python data/storage_load.py

Local run of DBT (optional)

You can run DBT locally if you want to check that DBT can correctly run your SQL queries on BigQuery.

You'll need to define your profile.yml in order to make DBT able to connect and query BigQuery. The default profile name is kube_bq_jobs. You can find an example in the ConfigMap here.

Deploy your CronJob to Kubernetes

Follow the README in the kubernetes/helm folder

harisonm / dbt-kube-demo Goto Github PK

dbt-kube-demo's Introduction

dbt-kube-demo

Prerequisites

Installation steps

GCP Project

Infrastructure

Export some fake data

Local run of DBT (optional)

Deploy your CronJob to Kubernetes

dbt-kube-demo's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent