datamanagementlab / deepdb-public Goto Github PK
View Code? Open in Web Editor NEWImplementation of DeepDB: Learn from Data, not from Queries!
License: MIT License
Implementation of DeepDB: Learn from Data, not from Queries!
License: MIT License
Hi there,
Thank you for the code, I am wondering if this can be used in non-tree schemas such as JOB (full IMDB) ?
Thanks,
I've been working with the TPC-H dataset (Scale Factor 1) in DeepDB and noticed an unusual pattern in cardinality estimation (CE). Specifically, when querying numerical columns with limited distinct values such as ORDERKEY and PARTKEY in the LINEITEM (total
records = 6001215) table, the system's predictions come out as multiples of the inverse of the sampling rate or exactly one. (e.g. CE results were 1, 6, 12, 18, ... when samples_per_spn = 1000000 1000000 1000000 1000000 1000000)
This occurs even after listing these columns under the no_compression section of the schema file to avoid compression effects. I'd appreciate any guidance or recommendations to mitigate this issue.
Hi,
I want to inject new datasets and what's the no_compression attribute means? Why we need to set all attributes to no_compression by default?
Best,
Kangfei
Hi,
Thank you for great work! I am reproducing results for the SSB dataset. Due to limited storage, I generated the SSB (SF=10) with the unified SSB generator (https://github.com/eyalroz/ssb-dbgen). However, after I trained the model and ran the evaluation:
2022-12-22 05:50:28,693 [INFO ] Evaluating AQP query 4: select d_year, p_brand1, sum(lo_revenue) from lineorder, dwdate, part, supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand1 in ('MFGR#2221','MFGR#2222','MFGR#2223','MFGR#2224','MFGR#2225','MFGR#2226','MFGR#2227','MFGR#2228') and s_region = 'ASIA' group by d_year, p_brand1 order by d_year, p_brand1;
Traceback (most recent call last):
File "/Users/jw/Desktop/deepdb-public/maqp.py", line 234, in <module>
evaluate_aqp_queries(args.ensemble_location, args.query_file_location, args.target_path, schema,
File "/Users/jw/Desktop/deepdb-public/evaluation/aqp_evaluation.py", line 127, in evaluate_aqp_queries
confidence_intervals, aqp_result = spn_ensemble.evaluate_query(query, rdc_spn_selection=rdc_spn_selection,
File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 746, in evaluate_query
group_bys_scopes, result_tuples, result_tuples_translated = self._evaluate_group_by_spn_ensembles(query)
File "/Users/jw/Desktop/deepdb-public/ensemble_compilation/spn_ensemble.py", line 677, in _evaluate_group_by_spn_ensembles
group_bys_scope, temporary_results, temporary_results_translated = spn.evaluate_group_by_combinations(
File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 251, in evaluate_group_by_combinations
range_conditions = self._parse_conditions(range_conditions)
File "/Users/jw/Desktop/deepdb-public/aqp_spn/aqp_spn.py", line 409, in _parse_conditions
assert len(matching_cols) == 1 or len(matching_fd_cols) == 1, "Found multiple or no matching columns"
AssertionError: Found multiple or no matching columns
For query 4 (Q4), I have printed and seen:
matching_fd_cols: []
matching_cols: []
The first 3 queries ran smoothly but I stuck at this one because of this error. I wonder how can I deal with it? Thank you for help!
Hello Professor, I am very interested in database partitioning, especially the combination with reinforcement learning. Is it convenient for you to share the code of this paper?
I wonder if it is possible to do single table (single SPN) cardinality estimation without modifying the code in SPNensemble class. I tried with a single table toy dataset and it run into a lot of errors. I assume you must have tested the code on single table first before making it work on multiple tables. Could you let me know how to estimate on single table?
Thanks a lot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.