Git Product home page Git Product logo

tpch's People

Contributors

arturgajowy avatar cberner avatar dain avatar electrum avatar findepi avatar hashhar avatar losipiuk avatar martint avatar sopel39 avatar vlsi avatar wendigo avatar winio94 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tpch's Issues

How to use airlift-tpch

Hello developers,
I was searching for java utility to generate TPCH data and found your code, Can you tell me how to use this as a API, I was looking for Readme file but i didn't found one.

Thanks and Regards

Prashant

toString() method on all TpchEntity

Hi,

first of all: Great project. I'm currently looking into using the code for creating a distributed TPC-H data generator.
I'm pretty confident that your code is suitable for that. Most likely, I'll contribute some stuff back, if thats okay for you.

One question (that could lead to a first contribution from my side): Is there a particular reason why all classes implementing TpchEntity don't use the toLine() method also for the toString() method?

Its very handy when using a debugger and for "system out" debugging.

Line Item count is off by a little bit

running this code:

 public void generate() throws Exception {
        double scale = 10; 

        System.out.println("Line Item:     " + Iterables.size(TpchTable.LINE_ITEM.createGenerator(scale, 1, 1)));
        System.out.println("Nation:        " + Iterables.size(TpchTable.NATION.createGenerator(scale, 1, 1)));
        System.out.println("Region:        " + Iterables.size(TpchTable.REGION.createGenerator(scale, 1, 1)));
        System.out.println("Part:          " + Iterables.size(TpchTable.PART.createGenerator(scale, 1, 1)));
        System.out.println("Customer:      " + Iterables.size(TpchTable.CUSTOMER.createGenerator(scale, 1, 1)));
        System.out.println("Orders:        " + Iterables.size(TpchTable.ORDERS.createGenerator(scale, 1, 1)));
        System.out.println("Part Supplier: " + Iterables.size(TpchTable.PART_SUPPLIER.createGenerator(scale, 1, 1)));
        System.out.println("Supplier:      " + Iterables.size(TpchTable.SUPPLIER.createGenerator(scale, 1, 1)));
}

with scale == 10 yields

Line Item:     59986052
Nation:        25
Region:        5
Part:          2000000
Customer:      1500000
Orders:        15000000
Part Supplier: 8000000
Supplier:      100000

scale == 1

Line Item:     6001215
Nation:        25
Region:        5
Part:          200000
Customer:      150000
Orders:        1500000
Part Supplier: 800000
Supplier:      10000

scale == 0.1

Line Item:     600572
Nation:        25
Region:        5
Part:          20000
Customer:      15000
Orders:        150000
Part Supplier: 80000
Supplier:      1000

While all the other counts match the spec perfectly, the Line Item count is off by a little bit (and for scale == 10 there are actually fewer lines than the spec requires).

Column names do not obey the TPC-H specification

TPC-H specification uses prefix for each column name. For example nation columns are named:

NATION Table Layout
Column Name Datatype Requirements Comment
N_NATIONKEY identifier 25 nations are populated
N_NAME fixed text, size 25
N_REGIONKEY identifier Foreign Key to R_REGIONKEY
N_COMMENT variable text, size 152
Primary Key: N_NATIONKEY

airlift tpch defines nation columns as:

    NATION_KEY("nationkey", TpchColumnTypes.IDENTIFIER) {
        public long getIdentifier(Nation nation) {
            return nation.getNationKey();
        }
    },
    NAME("name", TpchColumnTypes.varchar(25L)) {
        public String getString(Nation nation) {
            return nation.getName();
        }
    },
    REGION_KEY("regionkey", TpchColumnTypes.IDENTIFIER) {
        public long getIdentifier(Nation nation) {
            return nation.getRegionKey();
        }
    },
    COMMENT("comment", TpchColumnTypes.varchar(152L)) {
        public String getString(Nation nation) {
            return nation.getComment();
        }
    };

See lacking n_ in column names.

This causes that TPC-H queries cannot be simply generated and then executed in Presto, but require all the column names to be modified.

Specification file: http://cs.fit.edu/~pbernhar/teaching/databases/tpch.pdf

Is it possible for this project to support tpc-bih?

As in the title, is it possible for this "tpch" to support "tpc-bih" in TPC-BiH: A Benchmark for Bitemporal
Databases? Bih benchmark adds columns related to temporal feature in SQL:2011 to origin tpch tables.

TPC-H standard column names

Was wondering why the decision was made to not use the column names that match the TPC-H specification? This means that all queries that match the TPC-H standard (and run everywhere else) need to be rewritten to work with this (which is a bummer). Thanks.

Expose TPC-H table statistics

To have nice implementation of statistics interface in Presto it would be nice that table statistics

  • rows count
  • data size for individual columns
    is clearly exposed by airlift-tpch.

E.g via methods in io.airlift.tpch.TpchTable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.