Git Product home page Git Product logo

Comments (6)

DhamoPS avatar DhamoPS commented on May 24, 2024 2

@Dandandan I am working on a fix for this issue. I will create a PR this week for your approval.
Currently, Join Predicates which are part of OR Expressions are not handling properly in planner.rs. I handle this scenario and ensure that Join Predicates are handled properly. With this fix, Q19 would run faster without any change in Query. Thanks

from datafusion.

Dandandan avatar Dandandan commented on May 24, 2024

@alamb I am wondering if you think it would be the solution to introduce this boolean optimization (Distributivity of and over or):
(p and q1) OR (p and q2) should be replaced by p AND (q1 or q2)
p in this example could be our join predicate.

After this optimization, the physical planner should be able to see the predicate in the filter.

from datafusion.

alamb avatar alamb commented on May 24, 2024

@Dandandan -- yes, I think the "classic" thing to do is a "predicate rewrite" pass that rearranges predicates for further downstream operations

The goal is basically to get the predicate into a form of

good_predicate1 AND good_predicate2 AND ...

Where good_predicate means the predicate has special support in the execution engine.

Since OR is not typically handled specially, rewrites to AND are helpful

Rewrite 1: (p and q1) OR (p and q2) OR (p and ..) ==> p AND (q1 or q2)

Another common rewrite I have seen is
(col1 = A) OR (col1 = B) OR (col1 = C) ==> col1 IN (A, B, C)

Which then the execution engine can treat like a single column predicate (push down to scan) and build a hash table for (A, B, C) and do fast filtering.

Shall I file an issue to track this kind of rewrtite?

from datafusion.

Dandandan avatar Dandandan commented on May 24, 2024

Cool, thanks @alamb !

I guess there are also some more opportunities for rewriting in the above example:

l_shipmode in (‘AIR’, ‘AIR REG’) and and l_shipinstruct = ‘DELIVER IN PERSON are repeated and could benefit from being simplified and then pushed down.

p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’), p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) ,p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)

could be adding a filter with all the items.

p_size between 1 and 15 , p_size between 1 and 10, p_size between 1 and 5 could generate a filter with the least restrictive filter p_size between 1 and 15

Shall I file an issue to track this kind of rewrtite?
Would be nice to have some tracking issue 👍

from datafusion.

alamb avatar alamb commented on May 24, 2024

Filed #217 to track the idea

from datafusion.

alamb avatar alamb commented on May 24, 2024

Thanks @DhamoPS -- note that @xudong963 implemented #2858 which may have helped

from datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.