Git Product home page Git Product logo

analytics-engineering's Introduction

analytics-engineering

Building a Dimensional Data Warehouse using dbt and BigQuery.
Alt text

Navigation

This repository contains the Use Case of a Dimensional DWH build using dbt and bigquery, it contains two folders:

  1. The folder DL_NORTHWIND contains the dataset to model on dbt

  2. the other folder is the name assigned to the dbt project, in this case NT_DWH

dbt installation

  1. Install pyenv to manage Python versions and virtual environments: https://github.com/pyenv/pyenv

  2. Create a new Python environment for the dbt and its big query adapter:

     pyenv virtualenv dbt-bigquery
    
  3. Install dbt-core and dbt-bigquery:

     pip install --upgrade pip
     pip install dbt-core
     pip install dbt-bigquery
    

    and check the version:

     dbt --version
    

    that should be the output message:
    Alt text

  4. Initialize the dbt project and assign it a valid name. In this use case, the name is "NT_DWH" as per Northwind Traders Data Warehouse:

    • run dbt init NT_DWH
    • choose bigquery as database type
    • choose oauth as authenthication method
    • add your GCP project_id
  5. Now execute dbt debug and complete the configuration of your profiles.yml with the connections to the Database

     ~/.dbt/profiles.yml
         my-bigquery-db:
             target: dev
             outputs:
                 dev:
                 type: bigquery
                 method: oauth
                 project: [GCP project id]
                 dataset: [the name of your dbt dataset] # You can also use "schema" here
                 threads: [1 or more, number of parallelism in the tasks execution]
                 <optional_config>: <value>
    
  6. Setup the gcp oauth using gcloud: https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup#local-oauth-gcloud-setup:

     gcloud auth application-default login \
     --scopes=https://www.googleapis.com/auth/bigquery,\
     https://www.googleapis.com/auth/drive.readonly,\
     https://www.googleapis.com/auth/iam.test
    
  7. If you work with multiple warehouses you need to tell dbt which profiles.yml to use. There are several ways to do that, for more info check the official documentation: https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles.
    One possibility is to use the the --profiles-dir flag every time you execute a dbt command, followed by 'path/to/directory'.

That's how a well configured dbt project, looks like:

Alt text

analytics-engineering's People

Contributors

domenicoboris89 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.