Git Product home page Git Product logo

Comments (6)

DhamoPS avatar DhamoPS commented on May 22, 2024 2

@Dandandan I am working on a fix for this issue. I will create a PR this week for your approval.
Currently, Join Predicates which are part of OR Expressions are not handling properly in planner.rs. I handle this scenario and ensure that Join Predicates are handled properly. With this fix, Q19 would run faster without any change in Query. Thanks

from arrow-datafusion.

Dandandan avatar Dandandan commented on May 22, 2024

@alamb I am wondering if you think it would be the solution to introduce this boolean optimization (Distributivity of and over or):
(p and q1) OR (p and q2) should be replaced by p AND (q1 or q2)
p in this example could be our join predicate.

After this optimization, the physical planner should be able to see the predicate in the filter.

from arrow-datafusion.

alamb avatar alamb commented on May 22, 2024

@Dandandan -- yes, I think the "classic" thing to do is a "predicate rewrite" pass that rearranges predicates for further downstream operations

The goal is basically to get the predicate into a form of

good_predicate1 AND good_predicate2 AND ...

Where good_predicate means the predicate has special support in the execution engine.

Since OR is not typically handled specially, rewrites to AND are helpful

Rewrite 1: (p and q1) OR (p and q2) OR (p and ..) ==> p AND (q1 or q2)

Another common rewrite I have seen is
(col1 = A) OR (col1 = B) OR (col1 = C) ==> col1 IN (A, B, C)

Which then the execution engine can treat like a single column predicate (push down to scan) and build a hash table for (A, B, C) and do fast filtering.

Shall I file an issue to track this kind of rewrtite?

from arrow-datafusion.

Dandandan avatar Dandandan commented on May 22, 2024

Cool, thanks @alamb !

I guess there are also some more opportunities for rewriting in the above example:

l_shipmode in (‘AIR’, ‘AIR REG’) and and l_shipinstruct = ‘DELIVER IN PERSON are repeated and could benefit from being simplified and then pushed down.

p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’), p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) ,p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)

could be adding a filter with all the items.

p_size between 1 and 15 , p_size between 1 and 10, p_size between 1 and 5 could generate a filter with the least restrictive filter p_size between 1 and 15

Shall I file an issue to track this kind of rewrtite?
Would be nice to have some tracking issue 👍

from arrow-datafusion.

alamb avatar alamb commented on May 22, 2024

Filed #217 to track the idea

from arrow-datafusion.

alamb avatar alamb commented on May 22, 2024

Thanks @DhamoPS -- note that @xudong963 implemented #2858 which may have helped

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.