Git Product home page Git Product logo

siawayforward / dbt_about_it Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 30.64 MB

I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know for data transformation and testing, and allows me to solidify concepts of testing in data ops.

Python 94.41% PowerShell 0.07% Batchfile 0.01% C 0.15% HTML 5.30% Jinja 0.01% JavaScript 0.04% CSS 0.02%
dbt data-transformation data-testing

dbt_about_it's Introduction

dbt_fundamentals

I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know, and allows me to solidify concepts of testing in data ops.

Sia's Set Up Steps Notes

My notes for how I went from never having used dbt to setting up my first project. This was just as I went. Coming back to update with more details and links. A lot of this was helped by dbt's setting up docs page, but there were some assumptions made about how to navigate and wanted to break it down even more than it was on the tutorial (adding links after)

Objective

  • You have a data warehouse and some queries to run to generate information
  • You want to bring in the data from the warehouse, transform the data, and test to see that you did it correctly and made the correct assumptions based on agreed upon "rules".
  • DBT allows you to do the second thing in a streamlined abstracted layer without having to make too many custom scripts in your org.
  • I'm learning this because I've only ever done this manually or with a bunch of custom files :clown: ๐Ÿ˜„

BigQuery

  • set up a cloud GCP account, three month free trial
  • create a cloud project (keep in mind the project id)
  • create a user who is going to be accessing big query as an admin
  • create a credential for the user you created and save the keys as a JSON
  • download the JSON and save it in the path of profiles where dbt is installed (hold on, just download and wait, see below)

DBT CLI

  • go to terminal or the code environment
  • I created a venv so it is all self contained
  • in the venv, pip install dbt-bigquery because it will allow you to connect your DWH big query data credentials to dbt
  • once installed, run dbt --version to make sure everything you want is installed
  • if all is set (you should see dbt-core and dbt-bigquery), then its time to initiate the project
  • run dbt init jaffle_shop to create a starter for the project (git repo notes to be added after)
  • once done, update the profiles.yml file with your credentials for the jaffle_shop profile (this file is generated when you run init but you need to add credentials)

DBT x Google BigQuery

  • for each project, there will be a profile with the name of the project and credentials to warehouse it is using

  • the dbt profile yml has the name of the project, the general .dbt profiles contains all the profiles and their settings so that's how they're connected

  • remember the json file we set up and downloaded?. Take that and save it in the .dbt user folder on your computer

  • Now we update credentials. When updating credentials in the profiles.yml file, first you want to copy this chunk below in under the jaffle_shop branch (so go one branch in).

      ```yml
      target: dev
      outputs:
          dev:
          type: bigquery
          method: service-account
          keyfile: /Users/your_pc_or_mac_user/.dbt/dbt-learn-cred.json # replace this with the full path to your keyfile
          project: dbt-learn-fundamentals # Replace this with your project id
          dataset: dbt_sia # Replace this with dbt_your_name, e.g. dbt_bob or your schema name, this was an example from the dbt tutorial
          threads: 1
          timeout_seconds: 300
          location: US
          priority: interactive
      ```
    
  • After adding and saving this with updates, go back to the terminal (of course in your venv if you make one) and run dbt debug. The goal is for all checks to pass. If not, something is wrong.

    • The first time I got a fail, it was because I hadn't installed dbt-bigquery, fun right? :clown:
    • The second time I got a fail, the path I was running dbt debug from was outside the project folder. i.e. if you want to debug the set up of jaffle_shop, make sure you navigate into that directory :clown: :clown:

Models on models

  • Since the dbt-profile file is configured to have dbt compiler look in models folder, if we add a model, it should be in this directory (or any sub directories)
    • You however need to add a tree for the subdirectory in the yml file under models, and specify materialization settings
  • In the customers.sql file, nothing needs to change because dbt knows where to look
  • After this, you can dbt run with --full-refresh if you get an update error
    • to run specific models, you can use the path argument for the staging models dbt run -s path:models/staging --full-refresh
    • to run specific models with just a model, you can use dbt run --models model-name or dbt run -s model-file.sql
  • ref function allows you to specify model to select from within a different model like we do in the customers.sql file

Testing and Documentation

  • added a yml file under models directory for defining models, columns, and tests to run
  • there are four tests that come with dbt: unique, not null, accepted values, and relationships for foreign keys
  • after defining and adding these, run dbt test to see them run
  • you can also document the models by adding field descriptions in that same yml file
    • you don't have to have one yml file for all tests, you can do one per package, or one per model or per subdirectory. Your choice
    • after adding these fields, you can run dbt docs generate to generate documentation. On CLI, you also need to run dbt docs serve so that you can see it on a local server page. On dbt cloud, you'll have this after its generated

Resources shared by dbt on starter setup files

Try running the following commands:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Join the chat on Slack for live discussions and support
  • Find dbt events near you
  • Check out the blog for the latest news on dbt's development and best practices

dbt_about_it's People

Contributors

siawayforward avatar

Watchers

 avatar  avatar

Forkers

trellixvulnteam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.