datamanagementlab / deepdb-public Goto Github PK

Implementation of DeepDB: Learn from Data, not from Queries!

License: MIT License

Python 94.52% C++ 1.69% TSQL 3.80%

machine-learning learned-database sum-product-networks cardinality-estimation approximate-query-processing learned-database-components

deepdb-public's People

Contributors

Stargazers

Watchers

deepdb-public's Issues

Non tree graphs

Hi there,

Thank you for the code, I am wondering if this can be used in non-tree schemas such as JOB (full IMDB) ?

Thanks,

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1)

I've been working with the TPC-H dataset (Scale Factor 1) in DeepDB and noticed an unusual pattern in cardinality estimation (CE). Specifically, when querying numerical columns with limited distinct values such as ORDERKEY and PARTKEY in the LINEITEM (total
records = 6001215) table, the system's predictions come out as multiples of the inverse of the sampling rate or exactly one. (e.g. CE results were 1, 6, 12, 18, ... when samples_per_spn = 1000000 1000000 1000000 1000000 1000000)
This occurs even after listing these columns under the no_compression section of the schema file to avoid compression effects. I'd appreciate any guidance or recommendations to mitigate this issue.

What's the 'no_compression' attribute means?

Hi,

I want to inject new datasets and what's the no_compression attribute means? Why we need to set all attributes to no_compression by default?

Best,
Kangfei

Problem during reproducing SSB experiment's results

Hi,
Thank you for great work! I am reproducing results for the SSB dataset. Due to limited storage, I generated the SSB (SF=10) with the unified SSB generator (https://github.com/eyalroz/ssb-dbgen). However, after I trained the model and ran the evaluation:

2022-12-22 05:50:28,693 [INFO ]  Evaluating AQP query 4: select d_year, p_brand1, sum(lo_revenue) from lineorder, dwdate, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand1 in ('MFGR#2221','MFGR#2222','MFGR#2223','MFGR#2224','MFGR#2225','MFGR#2226','MFGR#2227','MFGR#2228') and s_region = 'ASIA' group by d_year, p_brand1 order by d_year, p_brand1;
Traceback (most recent call last):
  File "/Users/jw/Desktop/deepdb-public/maqp.py", line 234, in <module>
    evaluate_aqp_queries(args.ensemble_location, args.query_file_location, args.target_path, schema,
  File "/Users/jw/Desktop/deepdb-public/evaluation/aqp_evaluation.py", line 127, in evaluate_aqp_queries
    confidence_intervals, aqp_result = spn_ensemble.evaluate_query(query, rdc_spn_selection=rdc_spn_selection,
  File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 746, in evaluate_query
    group_bys_scopes, result_tuples, result_tuples_translated = self._evaluate_group_by_spn_ensembles(query)
  File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 677, in _evaluate_group_by_spn_ensembles
    group_bys_scope, temporary_results, temporary_results_translated = spn.evaluate_group_by_combinations(
  File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 251, in evaluate_group_by_combinations
    range_conditions = self._parse_conditions(range_conditions)
  File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 409, in _parse_conditions
    assert len(matching_cols) == 1 or len(matching_fd_cols) == 1, "Found multiple or no matching columns"
AssertionError: Found multiple or no matching columns

For query 4 (Q4), I have printed and seen:

matching_fd_cols: []
matching_cols: []

The first 3 queries ran smoothly but I stuck at this one because of this error. I wonder how can I deal with it? Thank you for help!

Learning a Partitioning Advisor for Cloud Databases

Hello Professor, I am very interested in database partitioning, especially the combination with reinforcement learning. Is it convenient for you to share the code of this paper?

tpcds-single table cardinality estimation

I wonder if it is possible to do single table (single SPN) cardinality estimation without modifying the code in SPNensemble class. I tried with a single table toy dataset and it run into a lot of errors. I assume you must have tested the code on single table first before making it work on multiple tables. Could you let me know how to estimate on single table?
Thanks a lot.

datamanagementlab / deepdb-public Goto Github PK

deepdb-public's People

Contributors

Stargazers

Watchers

Forkers

deepdb-public's Issues

Non tree graphs

Cardinality Estimation Anomalies for ORDERKEY and PARTKEY in TPC-H Dataset (SF=1)

What's the 'no_compression' attribute means?

Problem during reproducing SSB experiment's results

Learning a Partitioning Advisor for Cloud Databases

tpcds-single table cardinality estimation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent