Git Product home page Git Product logo

Comments (46)

alamb avatar alamb commented on July 1, 2024 4

I filed tickets for the other optimizer passes and linked them to the description of this ticket

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 3

Update here is that I am hacking on getting TypeCoercion (#10039) to avoid clones. It is turning out to be tricker than others as type coercion actually updates the types of the plans, so when rewriting expressions I need to find some way to re-calculate the schemas correctly. I am continuing to hack on that as @peter-toth works on CSE (#10067) and @Lordworms is cranking on some of the other passes

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 2

I have two idea before deep diving to the code

I think 2 sounds interesting

Another think I was thinking was something like LogicalPlan::rewrite(mut self, rewriter)

I think that along with Arc::try_unwrap could be used to minimize the places where cloning was actually needed

Maybe we can prototype with

impl OptimizerRule {
...

  /// does this rule support rewriting owned plans?
  fn supports_owned(&self) -> bool { return false }

  /// if supports_owned returns true, calls try_optimize_owned
  fn try_optimize_owned(
        &self,
        plan: LogicalPlan,
        config: &dyn OptimizerConfig
    ) -> Result<Transformed<LogicalPlan>, DataFusionError> {
      not_implemented_err!("try_optimized_owned is not implemented for this rule")
    }
...

And then play around with the code that calls try_optimize here https://github.com/apache/arrow-datafusion/blob/f4107d47bb4c0260d301294ddfc7c67d96574636/datafusion/optimizer/src/optimizer.rs#L360-L368 to try and use the try_optimize_owned API (without having to rewrite all the optimizers) for SimplifyExprs

If we could show a significant performance improvement for the sql_planner benchmarks then I think it would be worth spending time reworking the other APIs

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 2

so I don't understand why the former one is better, I guess they have the similar performance.

I was having trouble actually implementing the required TreeNode API for &mut LogicalPlan -- I can't remember the actual compiler error. I don't think they will differ in performance

I think a LogicalPlan tree is made of Arc links between the nodes currently. So I wonder if obtaining mutable references to a LogicalPlan node's children is possible at all without cloning the children first?

Yes, I was able to do so (see the horrible code here to do it):
https://github.com/apache/arrow-datafusion/pull/9708/files#diff-f69e720234d9da6cb3a4a178d3a5575fcd34f68191335a8c2465a172e8c4e6f1R498-R538

I also verified that it was actually unwrapping Arcs most of the time (not cloning)

(I.e. can we implement this todo https://github.com/apache/arrow-datafusion/pull/9780/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R138 effectively?)

I tried to implement this API and it was not going well. However, implementing a TreeNodeMutator (similar to TreeNodeRewriter but that took a mut &T rather than T) seems to be more promsising.

I am traveling this week so I won't have as much time as I would like to work on this, but I hope to push some more progress to #9780 soon

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 2

An update here is that with a few PRs in flight, some of the planning benchmarks go 50% faster (see details on #9954)

It will take a while to get these PRs in (there are a few chained up) but the "don't copy the plan so many times" approach seems to be paying off very well. Having the TreeNode API to build off is super helpful

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 2

Wow, lots of issues to work on 😀 🚀

Indeed - thankfully they are largely mechanical like #10066

I personally think #10067 is going to be worth 20% faster planning itself (but I need to work through the details)

from arrow-datafusion.

peter-toth avatar peter-toth commented on July 1, 2024 2

@alamb, @jayzhan211 this is another low hanging fruit to refactor UnwrapCastInComparison: #10087

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024 1

I analyze the sql_planner again and find that exprlist_to_fields and calc_func_dependencies_for_project are the two most time spending func.

so I think #9595 might be an important step for the optimization here.

https://github.com/apache/arrow-datafusion/blob/ad8d552b9f150c3c066b0764e84f72b667a649ff/datafusion/expr/src/logical_plan/plan.rs#L1804-L1813

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

I played around with some ideas last night and it is looking promising (though not yet done). I put my draft here #9708. I hope to try and work on it more over the next few days, but I am pretty busy preparing for meetups / presentations / papers for next week. It may be a few more days

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024 1

I would be surprised if #9708 improves, let me see if I can improve #9658 based on it.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

BTW here is a video of how I profile DataFusion: #9577 (comment)

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

I have added a checklist of items / passes to optimize copies in the PR description under the "Describe the solution you'd like" heading

from arrow-datafusion.

peter-toth avatar peter-toth commented on July 1, 2024 1

@alamb, I've opened a PR to refactor EliminateOuterJoin : #10081

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

Thanks @Lordworms -- that sounds great. There are now some good examples of the basic pattern (e.g #10081 and #10066)

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

I also re-profiled the tpcds benchmarks via

cargo bench --bench sql_planner -- tpcds_all

And here is what it shows:
Screenshot 2024-04-23 at 3 48 46 PM

Thus I am going to take a peek to see if there is any low hanging fruit in ApplyFunctionRewrites EliminateCrossJoin, OptimzieProjections and FilterPushdown

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024 1

take eliminate_duplicated_expr and eliminate_cross_join

FYI @Lordworms I took a look at brief look and they may be somewhat tricky. Of the two, i think elimnate_duplicated_expr is the easier

from arrow-datafusion.

Lordworms avatar Lordworms commented on July 1, 2024 1

take eliminate_duplicated_expr and eliminate_cross_join

FYI @Lordworms I took a look at brief look and they may be somewhat tricky. Of the two, i think elimnate_duplicated_expr is the easier

Yes, actually I have finished the elimnate_duplicated_expr already, I'll merge them two as a single PR after finishing eliminate_cross_join

from arrow-datafusion.

Lordworms avatar Lordworms commented on July 1, 2024 1

If possible two single PRs would be easier to review and merge

sure

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

I have two idea before deep diving to the code

  1. rewrite to have owned LogicalPlan then we can have owned expression easily. <- lots of code change
  2. lazy clone on rewrite, the spirit similar to copy on write. The idea from simplify() #9304 may help here too or we can rely on Transformed<Expr>
enum SimpliedExpr {
  Simplified(Expr)
  Original
}

fn simply_expr(&self, expr: &Expr) -> SimpliedExpr

if expr is Original, we can avoid clone.

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

@alamb I think being able to skip failed rules is the main challenge to rewriting the plan with ownership

https://github.com/apache/arrow-datafusion/blob/37253e57beb25f0f1a4412b75421a489c2cb3c6a/datafusion/optimizer/src/optimizer.rs#L325

We need to restore the original plan if the rewrite fails at some point, however, the old plan is consumed and we lose ownership. Giving up ownership too early is not a good idea if we need to restore the original plan.

Instead of Transformed, I think we need to preserve the old data and fail state to mimic Err in Result but with old data

#[derive(Debug, PartialEq, Eq)]
pub enum OptimizedState {
    Yes,
    No,
    Fail,
}

#[derive(Debug)]
pub struct Optimized<T, E = DataFusionError> {
    pub optimzied_data: Option<T>,
    // Used to store the original data if optimized successfully
    pub original_data: T,
    pub optimized_state: OptimizedState,
    // Used to store the error if optimized failed, so we can early return but preserve the original data
    pub error: Option<E>,
}

impl<T, E> Optimized<T, E> {
    pub fn yes(optimzied_data: T, original_data: T) -> Self {
        Self {
            optimzied_data: Some(optimzied_data),
            original_data,
            optimized_state: OptimizedState::Yes,
            error: None,
        }
    }

    pub fn no(original_data: T) -> Self {
        Self {
            optimzied_data: None,
            original_data,
            optimized_state: OptimizedState::No,
            error: None,
        }
    }

    pub fn fail(original_data: T, e: E) -> Self {
        Self {
            optimzied_data: None,
            original_data,
            optimized_state: OptimizedState::Fail,
            error: Some(e),
        }
    }
}

Then, for every optimization, we return Optimized<LogicalPlan> or Optimzied<Expr>

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

@alamb I think being able to skip failed rules is the main challenge to rewriting the plan with ownership

This is an excellent point

Instead of Transformed, I think we need to preserve the old data and fail state to mimic Err in Result but with old data

If we need to preserve the old data, there is no choice but to copy the LogicalPlan on each pass 🤔 (well I guess there could be some way to repair the partially rewritten plan, but that sounds very compliated).

Since skip_failed_rules is false by default:

https://github.com/apache/arrow-datafusion/blob/dcfe70987e98da0410146b5e1292ab20f3f118e0/datafusion/common/src/config.rs#L549

Could we maybe only do the copy when skip_failed_rules is enabled? Something like this:

if skip_failed_rules {
  let original_plan = plan.clone();
  let new_plan = optimizer.try_optimize_owned(plan)
   .ok_or_else(|e| original_plan);
} else {
  optimizer.try_optimize_owned(plan)?
}

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

I think it is possible to preserve unnecessary clones even if we preserve the old plan, and only clone the data for the new expr and new plan, but I agree it is a little complicated, it needs the enum I show above. We can optimize for the default path first, and others later on.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

Let's give it a try!

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

note: I profile my change and find out the time does not improve at all, I find that the cloned inside expr_list_to_fields may be the core issue.

from arrow-datafusion.

universalmind303 avatar universalmind303 commented on July 1, 2024

I'm curious, It seems like most of the reasoning behind Arc<LogicalPlan> and clone are due to the optimizer stage. Polars uses an intermediate representation ALogicalPlan (arena allocated logical plan) solely for the optimizer stage. I wonder if the same techinque could be applied here.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

I think area allocating is pretty common in various compiler/query optimizer systems because you know the lifetime of a plan doesn't outlive the the optimizer pass (as in you can bound the intermediate results)

I think rust actually is ideally setup to avoid having to arena allocate (which still requires copying stuff to a new plan) (using ownership) -- we just need to regigger how DataFusion is setup to use this. I think we are close

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

There is one issue that we are not able to call Arc::into_inner for skip-failed-rules path since we hold a clone for restoration.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

There is one issue that we are not able to call Arc::into_inner for skip-failed-rules path since we hold a clone for restoration.

What I did in #9708 which seems to have worked pretty well is to copy the LogicalPlan only if skip_failed_rules is set

Like this

                let prev_plan = if options.optimizer.skip_failed_rules {
                    Some(new_plan.clone())
                } else {
                    None
                };

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

But I played around without mut before, maybe there are ways to solve this

Yes, this is indeed the case. The code in #9708 is pretty terrible / hacky to deal with the Arcd inputs, but it does seem to work:

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

With some somewhat hacky code in #9708 (comment) I saw a 25% performance improvement.

Given the results I saw in #9708 here is my proposal:

  1. Update the Optimizer implementation / Optimizer APIs to rewrite existing LogicalPlans rather than make new ones
  2. Add the necessary APIs to LogicalPlan
  3. Contemplate eventually switching to use Box<LogicalPlan> instead of Arc<LogicalPlan> to store children in LogicalPlan....

I am not sure which of these APIs would be better (rewrite in place via &mut plan or take ownership (via plan)). I feel like the first one would likely be faster (as it avoids copying at all), but the second is easier to reason about 🤔

impl OptimizerRule {
...
  /// does this rule support rewriting plans rather than copying them?
  fn supports_mut (&self) -> bool { return false }

  /// if supports_mut, returns true, rewrites `plan` in place 
  fn optimize_mut(
        &self,
        plan: &mut LogicalPlan,
        config: &dyn OptimizerConfig
    ) -> Result<Transformed<()>, DataFusionError> {
      not_implemented_err!("try_optimized_owned is not implemented for this rule")
    }
...
impl OptimizerRule {
...

  /// does this rule support rewriting owned plans?
  fn supports_owned(&self) -> bool { return false }

  /// if supports_owned returns true, rewrites the LogicalPlan in returning a newly rewritten plan
  fn try_optimize_owned(
        &self,
        plan: LogicalPlan,
        config: &dyn OptimizerConfig
    ) -> Result<Transformed<LogicalPlan>, DataFusionError> {
      not_implemented_err!("try_optimized_owned is not implemented for this rule")
    }
...

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

Upd: I realize we need to do the std::mem::take for Box too. 😢
Should we change to box first, so we can easily modify plan without hack?

rewrite in place via &mut plan or take ownership (via plan))

I think we can take ownership by default, and mutable inplace if better? For example, change to muable for rewriting new expressions and new plans.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

Should we change to box first, so we can easily modify plan without hack?

I was exploring keeping the Arc initally for two reasons:

  1. It avoids breaking changes in downstream crates
  2. Until we fix the optimzier so it doens't do so much copying, switching to Box will only make the problem worse (as now all clones will do deep clones of all children.

SO I was thinking we could start with the hacky solution that kept Arc<LogicalPlan> and then we could potentially switch to Box<LogicalPlan> afterwards.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite LogicalNodes in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite LogicalNodes in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast

It seems treemutaor is &mut T at the first place, and in your previous design #9708 we have &mut T for rewrite_inputs and rewrite_exprs. The think the latter switch to &mut T when necessary, so I don't understand why the former one is better, I guess they have the similar performance.

from arrow-datafusion.

peter-toth avatar peter-toth commented on July 1, 2024

After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite LogicalNodes in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast

I think a LogicalPlan tree is made of Arc links between the nodes currently. So I wonder if obtaining mutable references to a LogicalPlan node's children is possible at all without cloning the children first?
(I.e. can we implement this todo https://github.com/apache/arrow-datafusion/pull/9780/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R138 effectively?)

from arrow-datafusion.

peter-toth avatar peter-toth commented on July 1, 2024

Before #8891 I ran some experiments in #7942 with different APIs to test if transform/rewrite could in place mutate the nodes. In place mutation indeed seemed very effective, especially on Expr trees: #7942 (comment). It seemed to improve less on LogicalPlan trees: #7942 (comment), but I tested the case when a reference is kept to the original tree so the Arc::try_unwrap trick wouldn't work there.
But then #8891 didn't add that API and kept the TreeNodeRewriter as a node consuming and producing one for the sake of simplicity.

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

I have the first PR for this issue here: #9780 that introduces TreeNodeMutator and uses it to avoid some copies. There is more to go but it makes a 10% improvement in planning

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

#9999 is now ready for review

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

Here is the next PR ready for review: #9948 (10% faster planning)

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

Here is a PR that gets (another) 20% speed boost by not copying in Expr simplifier: #9954

from arrow-datafusion.

jayzhan211 avatar jayzhan211 commented on July 1, 2024

I have added a checklist of items / passes to optimize copies in the PR description under the "Describe the solution you'd like" heading

Wow, lots of issues to work on 😀 🚀

from arrow-datafusion.

Lordworms avatar Lordworms commented on July 1, 2024

I am interested in refactoring some of the OptimizeRule, I'll start with some easy optimization rules

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

I filed some tickets to track the larger items #10209 #10210 #10211

from arrow-datafusion.

Lordworms avatar Lordworms commented on July 1, 2024

take eliminate_duplicated_expr and eliminate_cross_join

from arrow-datafusion.

alamb avatar alamb commented on July 1, 2024

If possible two single PRs would be easier to review and merge

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.