dbt_fundamentals

I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know, and allows me to solidify concepts of testing in data ops.

Sia's Set Up Steps Notes

My notes for how I went from never having used dbt to setting up my first project. This was just as I went. Coming back to update with more details and links. A lot of this was helped by dbt's setting up docs page, but there were some assumptions made about how to navigate and wanted to break it down even more than it was on the tutorial (adding links after)

Objective

You have a data warehouse and some queries to run to generate information
You want to bring in the data from the warehouse, transform the data, and test to see that you did it correctly and made the correct assumptions based on agreed upon "rules".
DBT allows you to do the second thing in a streamlined abstracted layer without having to make too many custom scripts in your org.
I'm learning this because I've only ever done this manually or with a bunch of custom files :clown: 😄

BigQuery

set up a cloud GCP account, three month free trial
create a cloud project (keep in mind the project id)
create a user who is going to be accessing big query as an admin
create a credential for the user you created and save the keys as a JSON
download the JSON and save it in the path of profiles where dbt is installed (hold on, just download and wait, see below)

DBT CLI

go to terminal or the code environment
I created a venv so it is all self contained
in the venv, pip install dbt-bigquery because it will allow you to connect your DWH big query data credentials to dbt
once installed, run dbt --version to make sure everything you want is installed
if all is set (you should see dbt-core and dbt-bigquery), then its time to initiate the project
run dbt init jaffle_shop to create a starter for the project (git repo notes to be added after)
once done, update the profiles.yml file with your credentials for the jaffle_shop profile (this file is generated when you run init but you need to add credentials)

DBT x Google BigQuery

for each project, there will be a profile with the name of the project and credentials to warehouse it is using
the dbt profile yml has the name of the project, the general .dbt profiles contains all the profiles and their settings so that's how they're connected
remember the json file we set up and downloaded?. Take that and save it in the .dbt user folder on your computer

Now we update credentials. When updating credentials in the profiles.yml file, first you want to copy this chunk below in under the jaffle_shop branch (so go one branch in).

  ```yml
  target: dev
  outputs:
      dev:
      type: bigquery
      method: service-account
      keyfile: /Users/your_pc_or_mac_user/.dbt/dbt-learn-cred.json # replace this with the full path to your keyfile
      project: dbt-learn-fundamentals # Replace this with your project id
      dataset: dbt_sia # Replace this with dbt_your_name, e.g. dbt_bob or your schema name, this was an example from the dbt tutorial
      threads: 1
      timeout_seconds: 300
      location: US
      priority: interactive
  ```

After adding and saving this with updates, go back to the terminal (of course in your venv if you make one) and run dbt debug. The goal is for all checks to pass. If not, something is wrong.
- The first time I got a fail, it was because I hadn't installed dbt-bigquery, fun right? :clown:
- The second time I got a fail, the path I was running dbt debug from was outside the project folder. i.e. if you want to debug the set up of jaffle_shop, make sure you navigate into that directory :clown: :clown:

Models on models

Since the dbt-profile file is configured to have dbt compiler look in models folder, if we add a model, it should be in this directory (or any sub directories)
- You however need to add a tree for the subdirectory in the yml file under models, and specify materialization settings
In the customers.sql file, nothing needs to change because dbt knows where to look
After this, you can dbt run with --full-refresh if you get an update error
- to run specific models, you can use the path argument for the staging models dbt run -s path:models/staging --full-refresh
- to run specific models with just a model, you can use dbt run --models model-name or dbt run -s model-file.sql
ref function allows you to specify model to select from within a different model like we do in the customers.sql file

Testing and Documentation

added a yml file under models directory for defining models, columns, and tests to run
there are four tests that come with dbt: unique, not null, accepted values, and relationships for foreign keys
after defining and adding these, run dbt test to see them run
you can also document the models by adding field descriptions in that same yml file
- you don't have to have one yml file for all tests, you can do one per package, or one per model or per subdirectory. Your choice
- after adding these fields, you can run dbt docs generate to generate documentation. On CLI, you also need to run dbt docs serve so that you can see it on a local server page. On dbt cloud, you'll have this after its generated

Resources shared by dbt on starter setup files

Try running the following commands:

Learn more about dbt in the docs
Check out Discourse for commonly asked questions and answers
Join the chat on Slack for live discussions and support
Find dbt events near you
Check out the blog for the latest news on dbt's development and best practices

siawayforward / dbt_about_it Goto Github PK

dbt_about_it's Introduction

dbt_fundamentals

Sia's Set Up Steps Notes

Objective

BigQuery

DBT CLI

DBT x Google BigQuery

Models on models

Testing and Documentation

Resources shared by dbt on starter setup files

dbt_about_it's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent