Git Product home page Git Product logo

tpc-ds_big-query's Introduction

TPC - DS - Big Query

Credits: Most scripts have been referenced from Fivetran DW Benchmark and have been adapted to suit our particular usecase.

Steps:

  1. Move dsdgen to a GCS bucket to a specific location as mentioned in the bootstrap script
  2. Create a High CPU VM eg. 16vCPU
  3. Clone this repository
git clone $REPO_URL
  1. Give all script files executable permission
chmod +x *.sh
  1. Run bootstrap.sh
    1. This pulls dsdgen binary
    2. Installs Google Fuse; this is to mount GCS bucket as a local folder - More info
  2. Run data_gen.sh
    Usage:
    ./data_gen.sh $CPU $SCALE
    1. This is responsible for generating data
    2. $CPU denotes the amount of parallelism must be > 1
    3. $SCALE denotes the scale of data that needs to be generated
    4. This creates and mounts a GCS Bucket and writes data to it
    5. NOTE: Ensure that $CPU is close to number of CPUs in VM for efficient parallel generation
  3. Run load_data.sh
    Usage:
    ./load_data.sh $SCALE
    1. This is responsible of loading data in GCP buckets created in step 5 to BigQuery
    2. $SCALE denotes the scale of data that needs to be loaded to BigQuery
    3. Note: Before running this step ensure that data is generated and present in the appropriate GCS Bucket
  4. Run benchmark.sh
    Usage:
    ./benchmark.sh $SCALE
    1. This is responsible for running TPC-DS queries and measuring query execution time
    2. Generates a csv file in results folder containing the query start_time and end_time
    3. Saves query statistics in the same directory

tpc-ds_big-query's People

Contributors

jiaweihu08 avatar snithish avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.