Git Product home page Git Product logo

deepdb-public's People

Contributors

bhilprecht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deepdb-public's Issues

Non tree graphs

Hi there,

Thank you for the code, I am wondering if this can be used in non-tree schemas such as JOB (full IMDB) ?

Thanks,

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1)

I've been working with the TPC-H dataset (Scale Factor 1) in DeepDB and noticed an unusual pattern in cardinality estimation (CE). Specifically, when querying numerical columns with limited distinct values such as ORDERKEY and PARTKEY in the LINEITEM (total
records = 6001215) table, the system's predictions come out as multiples of the inverse of the sampling rate or exactly one. (e.g. CE results were 1, 6, 12, 18, ... when samples_per_spn = 1000000 1000000 1000000 1000000 1000000)
This occurs even after listing these columns under the no_compression section of the schema file to avoid compression effects. I'd appreciate any guidance or recommendations to mitigate this issue.

Problem during reproducing SSB experiment's results

Hi,
Thank you for great work! I am reproducing results for the SSB dataset. Due to limited storage, I generated the SSB (SF=10) with the unified SSB generator (https://github.com/eyalroz/ssb-dbgen). However, after I trained the model and ran the evaluation:

2022-12-22 05:50:28,693 [INFO ]  Evaluating AQP query 4: select d_year, p_brand1, sum(lo_revenue) from lineorder, dwdate, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand1 in ('MFGR#2221','MFGR#2222','MFGR#2223','MFGR#2224','MFGR#2225','MFGR#2226','MFGR#2227','MFGR#2228') and s_region = 'ASIA' group by d_year, p_brand1 order by d_year, p_brand1;
Traceback (most recent call last):
  File "/Users/jw/Desktop/deepdb-public/maqp.py", line 234, in <module>
    evaluate_aqp_queries(args.ensemble_location, args.query_file_location, args.target_path, schema,
  File "/Users/jw/Desktop/deepdb-public/evaluation/aqp_evaluation.py", line 127, in evaluate_aqp_queries
    confidence_intervals, aqp_result = spn_ensemble.evaluate_query(query, rdc_spn_selection=rdc_spn_selection,
  File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 746, in evaluate_query
    group_bys_scopes, result_tuples, result_tuples_translated = self._evaluate_group_by_spn_ensembles(query)
  File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 677, in _evaluate_group_by_spn_ensembles
    group_bys_scope, temporary_results, temporary_results_translated = spn.evaluate_group_by_combinations(
  File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 251, in evaluate_group_by_combinations
    range_conditions = self._parse_conditions(range_conditions)
  File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 409, in _parse_conditions
    assert len(matching_cols) == 1 or len(matching_fd_cols) == 1, "Found multiple or no matching columns"
AssertionError: Found multiple or no matching columns

For query 4 (Q4), I have printed and seen:

matching_fd_cols: []
matching_cols: []

The first 3 queries ran smoothly but I stuck at this one because of this error. I wonder how can I deal with it? Thank you for help!

tpcds-single table cardinality estimation

I wonder if it is possible to do single table (single SPN) cardinality estimation without modifying the code in SPNensemble class. I tried with a single table toy dataset and it run into a lot of errors. I assume you must have tested the code on single table first before making it work on multiple tables. Could you let me know how to estimate on single table?
Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.