Git Product home page Git Product logo

tpch-tools's Introduction

TPC-H benchmark tools

This repository facilitates the use of the TPC-H benchmark (or, more precisely, the TPC-H benchmark data and individual queries) for DBMS-related work in and around the MonetDB in-memory DBMS

Specifically, the repository comprises:

  • A script for automating the tasks of building and running the data generation, creating a MonetDB database with the appropriate schema and constraints, and loading the generated data into that DB.
  • A set of query files, for each of the benchmark's 22 queries - with typical choices for the non-fixed values (e.g. green for [COLOR] in Q9).
  • A set of expected result files, one for each query.
  • Miscellaneous additional potentially useful scripts and SQL queries.
  • dbgen, the TPC's data generation utility for TPC-H. It is included in this repository as a submodule, linked to eyalroz/tpch-dbgen where it's maintained separately (with several fixes of issues over the TPC's own version). Note: You have to perform a recursive clone (git clone --recursive [email protected]:eyalroz/tpch-tools.git) to also have get the contents of this submodule.

The repository does not contain:

  • The specifications of the TPC-H benchmark. The specs are from the TPC Website, or you can directly downloaded the Benchmarks Specs PDF.
  • Source code or compiled binaries for actually executing the TPC-H benchmark, i.e. randomly sending variants of the 22 queries to a server. The qgen utility, in the dbgen/ subdirectory, can generate these random queries, but it won't open a connection and throw them at your DBMS using the appropriate line protocol.

Currently, only MonetDB is supported as the DBMS into which data is to be loaded - but in the future this may expand.

Requirements

  • The Bourne Again Shell - bash
  • various typical Unix-ish command-line tools: unzip, sed, echo and so on.
  • MonetDB installed and running
  • Enough disk space for the data you want (for SF 1 you're likely to need ~ 2 GB - for the generated pre-loading data and the loaded data together; but this will drop after you delete the pre-loading data down to ~ 1 GB)

Getting started

  1. Make sure you have a MonetDB 'Database Farm' set up (see the MonetDB tutorial if you're not sure how to do that)
  2. Invoke ./scripts/setup-tpch-db to generate data for some scale factor; it has the following command-line options:
Options:
-r, --recreate              If the TPC-H database exists, recreate it, dropping
                            all existing data. (If this option is unset, the 
                            database must not already exist)
-s, --scale-factor FACTOR   The amount of test data to generate, in GB
-G, --use-generated         Use previously-generated table load files (in the
                            data generation directory instead of re-generating
                            them using the dbgen utility.
-l, --log-file FILENAME     Name of the file to log output into
-d, --db-name NAME          Name of the database holding TPC-H test data
                            within the DB farm
-f, --db-farm PATH          Filesystem path for the root directory of the DB farm
                            with the generated DB
-p, --platform PLATFORM     Platform for which to try building the data 
                            generation utility (one of ATT DOS HP IBM ICL MVS SGI
                            SUN U2200 VMS LINUX WIN32 MAC)
-P, --port NUMBER           Network port on the local host, which the server
                            will related to the DB farm
-D, --data-gen-dir PATH     directory in which to generate the TPC-H table data
-k, --keep-raw-tables       Keep the raw data generated by the tool outside of
                            the DBMS
-v, --verbose               Be verbose


  1. Execute scripts/run_benchmark_queries.sh -v as a sanity check, to make sure you get results that look like the expected answer (you can also diff-compare the results you get with scripts/run_benchmark_queries.sh -w to the reference results in expected_results/).

Questions? Requests? Feedback? Bugs?

Feel free to open an issue or write me.

tpch-tools's People

Contributors

eyalroz avatar zingdle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tpch-tools's Issues

bugs in scripts/setup-tpch-db

  1. in setup-tpch-db:229:
    [[ ! -d "$data_generation_dir" ]] || die "Was requested ...."
    should be changed to
    [[ -d "$data_generation_dir" ]] || die "Was requested ...."
    The logic is: if data directory doesn't exist , then "die" statement should be executed.

  2. in setup-tpch-db: 400:
    [[ -e $data_generation_dir ]] || die "..."
    When there is no file named "$data_generation_dir ", condition "[[ -e $data_generation_dir ]]" is false and the latter "die" statement is executed, which is not wanted.
    Actually, the [[-e $data_generation_dir]] condition should be changed to [[ ! -e "$data_generation_dir" ]].

fatal: clone of '[email protected]:eyalroz/tpch-dbgen.git' into submodule path 'dbgen' failed

Cannot clone submodule on a machine without my SSH keys added.

$ git submodule update --init
Cloning into 'dbgen'...
Warning: Permanently added the RSA host key for IP address '13.229.188.59' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:eyalroz/tpch-dbgen.git' into submodule path 'dbgen' failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.