Comments (6)
@Dandandan I am working on a fix for this issue. I will create a PR this week for your approval.
Currently, Join Predicates which are part of OR Expressions are not handling properly in planner.rs. I handle this scenario and ensure that Join Predicates are handled properly. With this fix, Q19 would run faster without any change in Query. Thanks
from datafusion.
@alamb I am wondering if you think it would be the solution to introduce this boolean optimization (Distributivity of and over or):
(p and q1) OR (p and q2) should be replaced by p AND (q1 or q2)
p in this example could be our join predicate.
After this optimization, the physical planner should be able to see the predicate in the filter.
from datafusion.
@Dandandan -- yes, I think the "classic" thing to do is a "predicate rewrite" pass that rearranges predicates for further downstream operations
The goal is basically to get the predicate into a form of
good_predicate1
AND good_predicate2
AND ...
Where good_predicate
means the predicate has special support in the execution engine.
Since OR is not typically handled specially, rewrites to AND are helpful
Rewrite 1: (p and q1) OR (p and q2) OR (p and ..) ==> p AND (q1 or q2)
Another common rewrite I have seen is
(col1 = A) OR (col1 = B) OR (col1 = C) ==> col1 IN (A, B, C)
Which then the execution engine can treat like a single column predicate (push down to scan) and build a hash table for (A, B, C)
and do fast filtering.
Shall I file an issue to track this kind of rewrtite?
from datafusion.
Cool, thanks @alamb !
I guess there are also some more opportunities for rewriting in the above example:
l_shipmode in (‘AIR’, ‘AIR REG’)
and and l_shipinstruct = ‘DELIVER IN PERSON
are repeated and could benefit from being simplified and then pushed down.
p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)
, p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
,p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
could be adding a filter with all the items.
p_size between 1 and 15
, p_size between 1 and 10
, p_size between 1 and 5
could generate a filter with the least restrictive filter p_size between 1 and 15
Shall I file an issue to track this kind of rewrtite?
Would be nice to have some tracking issue 👍
from datafusion.
Filed #217 to track the idea
from datafusion.
Thanks @DhamoPS -- note that @xudong963 implemented #2858 which may have helped
from datafusion.
Related Issues (20)
- `OuterColumnReference` Expr --> String Support HOT 1
- Add an example of how to use the SQL parser/unparser API HOT 2
- Support Substrait VirtualTables
- Discussion: make it easier for specify SQL --> function translation HOT 3
- Create a DataFusion blog HOT 2
- `analysis.rs` bounds check panic HOT 2
- AggregateUDF expression API design HOT 1
- Example for building an external index for parquet files
- `array_slice` can't correctly handle NULL parameters or some edge cases
- Add an example of how to convert LogicalPlan to/from SQL Strings
- Cast from string to date with "/" HOT 2
- error: this arithmetic operation will overflow (on i386)
- REmove workaround for `COUNT(*)` in subquery decorrelation code
- Make SQL strings generated from `Expr`s "prettier" HOT 5
- Dynamic schema for custom TableProvider
- ScalarValue serialization does not handle nested dictionary values
- Using `Expr::field` panics HOT 3
- Improve signature of `get_field` is function
- UserDefinedLogicalNode::from_template does not return a Result<...> >
- UserDefindedLogicalNode::from_template does not return a Result<...>. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datafusion.