Comments (46)
I filed tickets for the other optimizer passes and linked them to the description of this ticket
from arrow-datafusion.
Update here is that I am hacking on getting TypeCoercion
(#10039) to avoid clones. It is turning out to be tricker than others as type coercion actually updates the types of the plans, so when rewriting expressions I need to find some way to re-calculate the schemas correctly. I am continuing to hack on that as @peter-toth works on CSE (#10067) and @Lordworms is cranking on some of the other passes
from arrow-datafusion.
I have two idea before deep diving to the code
I think 2 sounds interesting
Another think I was thinking was something like LogicalPlan::rewrite(mut self, rewriter)
I think that along with Arc::try_unwrap
could be used to minimize the places where cloning was actually needed
Maybe we can prototype with
impl OptimizerRule {
...
/// does this rule support rewriting owned plans?
fn supports_owned(&self) -> bool { return false }
/// if supports_owned returns true, calls try_optimize_owned
fn try_optimize_owned(
&self,
plan: LogicalPlan,
config: &dyn OptimizerConfig
) -> Result<Transformed<LogicalPlan>, DataFusionError> {
not_implemented_err!("try_optimized_owned is not implemented for this rule")
}
...
And then play around with the code that calls try_optimize
here https://github.com/apache/arrow-datafusion/blob/f4107d47bb4c0260d301294ddfc7c67d96574636/datafusion/optimizer/src/optimizer.rs#L360-L368 to try and use the try_optimize_owned
API (without having to rewrite all the optimizers) for SimplifyExprs
If we could show a significant performance improvement for the sql_planner
benchmarks then I think it would be worth spending time reworking the other APIs
from arrow-datafusion.
so I don't understand why the former one is better, I guess they have the similar performance.
I was having trouble actually implementing the required TreeNode API for &mut LogicalPlan
-- I can't remember the actual compiler error. I don't think they will differ in performance
I think a
LogicalPlan
tree is made ofArc
links between the nodes currently. So I wonder if obtaining mutable references to aLogicalPlan
node's children is possible at all without cloning the children first?
Yes, I was able to do so (see the horrible code here to do it):
https://github.com/apache/arrow-datafusion/pull/9708/files#diff-f69e720234d9da6cb3a4a178d3a5575fcd34f68191335a8c2465a172e8c4e6f1R498-R538
I also verified that it was actually unwrapping Arc
s most of the time (not cloning)
(I.e. can we implement this todo https://github.com/apache/arrow-datafusion/pull/9780/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R138 effectively?)
I tried to implement this API and it was not going well. However, implementing a TreeNodeMutator
(similar to TreeNodeRewriter
but that took a mut &T
rather than T
) seems to be more promsising.
I am traveling this week so I won't have as much time as I would like to work on this, but I hope to push some more progress to #9780 soon
from arrow-datafusion.
An update here is that with a few PRs in flight, some of the planning benchmarks go 50% faster (see details on #9954)
It will take a while to get these PRs in (there are a few chained up) but the "don't copy the plan so many times" approach seems to be paying off very well. Having the TreeNode
API to build off is super helpful
from arrow-datafusion.
Wow, lots of issues to work on 😀 🚀
Indeed - thankfully they are largely mechanical like #10066
I personally think #10067 is going to be worth 20% faster planning itself (but I need to work through the details)
from arrow-datafusion.
@alamb, @jayzhan211 this is another low hanging fruit to refactor UnwrapCastInComparison
: #10087
from arrow-datafusion.
I analyze the sql_planner
again and find that exprlist_to_fields
and calc_func_dependencies_for_project
are the two most time spending func.
so I think #9595 might be an important step for the optimization here.
from arrow-datafusion.
I played around with some ideas last night and it is looking promising (though not yet done). I put my draft here #9708. I hope to try and work on it more over the next few days, but I am pretty busy preparing for meetups / presentations / papers for next week. It may be a few more days
from arrow-datafusion.
I would be surprised if #9708 improves, let me see if I can improve #9658 based on it.
from arrow-datafusion.
BTW here is a video of how I profile DataFusion: #9577 (comment)
from arrow-datafusion.
I have added a checklist of items / passes to optimize copies in the PR description under the "Describe the solution you'd like"
heading
from arrow-datafusion.
@alamb, I've opened a PR to refactor EliminateOuterJoin
: #10081
from arrow-datafusion.
Thanks @Lordworms -- that sounds great. There are now some good examples of the basic pattern (e.g #10081 and #10066)
from arrow-datafusion.
I also re-profiled the tpcds benchmarks via
cargo bench --bench sql_planner -- tpcds_all
Thus I am going to take a peek to see if there is any low hanging fruit in ApplyFunctionRewrites
EliminateCrossJoin
, OptimzieProjections
and FilterPushdown
from arrow-datafusion.
take eliminate_duplicated_expr and eliminate_cross_join
FYI @Lordworms I took a look at brief look and they may be somewhat tricky. Of the two, i think elimnate_duplicated_expr is the easier
from arrow-datafusion.
take eliminate_duplicated_expr and eliminate_cross_join
FYI @Lordworms I took a look at brief look and they may be somewhat tricky. Of the two, i think elimnate_duplicated_expr is the easier
Yes, actually I have finished the elimnate_duplicated_expr already, I'll merge them two as a single PR after finishing eliminate_cross_join
from arrow-datafusion.
If possible two single PRs would be easier to review and merge
sure
from arrow-datafusion.
I have two idea before deep diving to the code
- rewrite to have owned LogicalPlan then we can have owned expression easily. <- lots of code change
- lazy clone on rewrite, the spirit similar to copy on write. The idea from simplify() #9304 may help here too or we can rely on
Transformed<Expr>
enum SimpliedExpr {
Simplified(Expr)
Original
}
fn simply_expr(&self, expr: &Expr) -> SimpliedExpr
if expr is Original, we can avoid clone.
from arrow-datafusion.
@alamb I think being able to skip failed rules is the main challenge to rewriting the plan with ownership
We need to restore the original plan if the rewrite fails at some point, however, the old plan is consumed and we lose ownership. Giving up ownership too early is not a good idea if we need to restore the original plan.
Instead of Transformed, I think we need to preserve the old data and fail state to mimic Err in Result but with old data
#[derive(Debug, PartialEq, Eq)]
pub enum OptimizedState {
Yes,
No,
Fail,
}
#[derive(Debug)]
pub struct Optimized<T, E = DataFusionError> {
pub optimzied_data: Option<T>,
// Used to store the original data if optimized successfully
pub original_data: T,
pub optimized_state: OptimizedState,
// Used to store the error if optimized failed, so we can early return but preserve the original data
pub error: Option<E>,
}
impl<T, E> Optimized<T, E> {
pub fn yes(optimzied_data: T, original_data: T) -> Self {
Self {
optimzied_data: Some(optimzied_data),
original_data,
optimized_state: OptimizedState::Yes,
error: None,
}
}
pub fn no(original_data: T) -> Self {
Self {
optimzied_data: None,
original_data,
optimized_state: OptimizedState::No,
error: None,
}
}
pub fn fail(original_data: T, e: E) -> Self {
Self {
optimzied_data: None,
original_data,
optimized_state: OptimizedState::Fail,
error: Some(e),
}
}
}
Then, for every optimization, we return Optimized<LogicalPlan>
or Optimzied<Expr>
from arrow-datafusion.
@alamb I think being able to skip failed rules is the main challenge to rewriting the plan with ownership
This is an excellent point
Instead of Transformed, I think we need to preserve the old data and fail state to mimic Err in Result but with old data
If we need to preserve the old data, there is no choice but to copy the LogicalPlan
on each pass 🤔 (well I guess there could be some way to repair the partially rewritten plan, but that sounds very compliated).
Since skip_failed_rules
is false
by default:
Could we maybe only do the copy when skip_failed_rules
is enabled? Something like this:
if skip_failed_rules {
let original_plan = plan.clone();
let new_plan = optimizer.try_optimize_owned(plan)
.ok_or_else(|e| original_plan);
} else {
optimizer.try_optimize_owned(plan)?
}
from arrow-datafusion.
I think it is possible to preserve unnecessary clones even if we preserve the old plan, and only clone the data for the new expr and new plan, but I agree it is a little complicated, it needs the enum I show above. We can optimize for the default path first, and others later on.
from arrow-datafusion.
Let's give it a try!
from arrow-datafusion.
note: I profile my change and find out the time does not improve at all, I find that the cloned inside expr_list_to_fields
may be the core issue.
from arrow-datafusion.
I'm curious, It seems like most of the reasoning behind Arc<LogicalPlan> and clone
are due to the optimizer stage. Polars uses an intermediate representation ALogicalPlan
(arena allocated logical plan) solely for the optimizer stage. I wonder if the same techinque could be applied here.
from arrow-datafusion.
I think area allocating is pretty common in various compiler/query optimizer systems because you know the lifetime of a plan doesn't outlive the the optimizer pass (as in you can bound the intermediate results)
I think rust actually is ideally setup to avoid having to arena allocate (which still requires copying stuff to a new plan) (using ownership) -- we just need to regigger how DataFusion is setup to use this. I think we are close
from arrow-datafusion.
There is one issue that we are not able to call Arc::into_inner
for skip-failed-rules path since we hold a clone for restoration.
from arrow-datafusion.
There is one issue that we are not able to call Arc::into_inner for skip-failed-rules path since we hold a clone for restoration.
What I did in #9708 which seems to have worked pretty well is to copy the LogicalPlan only if skip_failed_rules
is set
Like this
let prev_plan = if options.optimizer.skip_failed_rules {
Some(new_plan.clone())
} else {
None
};
from arrow-datafusion.
from arrow-datafusion.
But I played around without mut before, maybe there are ways to solve this
Yes, this is indeed the case. The code in #9708 is pretty terrible / hacky to deal with the Arc
d inputs, but it does seem to work:
from arrow-datafusion.
With some somewhat hacky code in #9708 (comment) I saw a 25% performance improvement.
Given the results I saw in #9708 here is my proposal:
- Update the Optimizer implementation / Optimizer APIs to rewrite existing LogicalPlans rather than make new ones
- Add the necessary APIs to LogicalPlan
- Contemplate eventually switching to use
Box<LogicalPlan>
instead ofArc<LogicalPlan>
to store children in LogicalPlan....
I am not sure which of these APIs would be better (rewrite in place via &mut plan
or take ownership (via plan
)). I feel like the first one would likely be faster (as it avoids copying at all), but the second is easier to reason about 🤔
impl OptimizerRule {
...
/// does this rule support rewriting plans rather than copying them?
fn supports_mut (&self) -> bool { return false }
/// if supports_mut, returns true, rewrites `plan` in place
fn optimize_mut(
&self,
plan: &mut LogicalPlan,
config: &dyn OptimizerConfig
) -> Result<Transformed<()>, DataFusionError> {
not_implemented_err!("try_optimized_owned is not implemented for this rule")
}
...
impl OptimizerRule {
...
/// does this rule support rewriting owned plans?
fn supports_owned(&self) -> bool { return false }
/// if supports_owned returns true, rewrites the LogicalPlan in returning a newly rewritten plan
fn try_optimize_owned(
&self,
plan: LogicalPlan,
config: &dyn OptimizerConfig
) -> Result<Transformed<LogicalPlan>, DataFusionError> {
not_implemented_err!("try_optimized_owned is not implemented for this rule")
}
...
from arrow-datafusion.
Upd: I realize we need to do the std::mem::take for Box too. 😢
Should we change to box first, so we can easily modify plan without hack?
rewrite in place via &mut plan or take ownership (via plan))
I think we can take ownership by default, and mutable inplace if better? For example, change to muable for rewriting new expressions and new plans.
from arrow-datafusion.
Should we change to box first, so we can easily modify plan without hack?
I was exploring keeping the Arc
initally for two reasons:
- It avoids breaking changes in downstream crates
- Until we fix the optimzier so it doens't do so much copying, switching to
Box
will only make the problem worse (as now all clones will do deep clones of all children.
SO I was thinking we could start with the hacky solution that kept Arc<LogicalPlan>
and then we could potentially switch to Box<LogicalPlan>
afterwards.
from arrow-datafusion.
After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite LogicalNode
s in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast
from arrow-datafusion.
After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite
LogicalNode
s in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast
It seems treemutaor is &mut T
at the first place, and in your previous design #9708 we have &mut T
for rewrite_inputs
and rewrite_exprs
. The think the latter switch to &mut T
when necessary, so I don't understand why the former one is better, I guess they have the similar performance.
from arrow-datafusion.
After playing around with this, I have become convinced what is really needed is a "TreeMutator" type API on TreeNode that lets us rewrite
LogicalNode
s in place -- if we had that API I think we can make the DataFusion Optimizer screaming fast
I think a LogicalPlan
tree is made of Arc
links between the nodes currently. So I wonder if obtaining mutable references to a LogicalPlan
node's children is possible at all without cloning the children first?
(I.e. can we implement this todo https://github.com/apache/arrow-datafusion/pull/9780/files#diff-9619441d9605f143a911319cea75ae5192e6c5b17acfcbc17a3c73a9e32a8e61R138 effectively?)
from arrow-datafusion.
Before #8891 I ran some experiments in #7942 with different APIs to test if transform/rewrite could in place mutate the nodes. In place mutation indeed seemed very effective, especially on Expr
trees: #7942 (comment). It seemed to improve less on LogicalPlan
trees: #7942 (comment), but I tested the case when a reference is kept to the original tree so the Arc::try_unwrap
trick wouldn't work there.
But then #8891 didn't add that API and kept the TreeNodeRewriter
as a node consuming and producing one for the sake of simplicity.
from arrow-datafusion.
I have the first PR for this issue here: #9780 that introduces TreeNodeMutator
and uses it to avoid some copies. There is more to go but it makes a 10% improvement in planning
from arrow-datafusion.
#9999 is now ready for review
from arrow-datafusion.
Here is the next PR ready for review: #9948 (10% faster planning)
from arrow-datafusion.
Here is a PR that gets (another) 20% speed boost by not copying in Expr simplifier: #9954
from arrow-datafusion.
I have added a checklist of items / passes to optimize copies in the PR description under the
"Describe the solution you'd like"
heading
Wow, lots of issues to work on 😀 🚀
from arrow-datafusion.
I am interested in refactoring some of the OptimizeRule, I'll start with some easy optimization rules
from arrow-datafusion.
I filed some tickets to track the larger items #10209 #10210 #10211
from arrow-datafusion.
take eliminate_duplicated_expr and eliminate_cross_join
from arrow-datafusion.
If possible two single PRs would be easier to review and merge
from arrow-datafusion.
Related Issues (20)
- Using `date_bin` with a time zone in a time range that contains daylight savings does not work HOT 3
- binary_op should be handled by sql_expr_to_logical_expr.. This was likely caused by a bug in DataFusion's code HOT 4
- [Epic] A Collection of Sort Based Optimizations HOT 5
- Implement a way to preserve partitioning through `UnionExec` without losing ordering HOT 3
- DataFusion should support casting strings such as "4e7" to decimal HOT 2
- Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered HOT 15
- Regression with `coalesce` expr_fn function: can't take multiple arguments
- INSERT INTO SQL failing on CSV-backed table HOT 3
- Unify SQL planning for `ORDER BY`, `HAVING`, `DISTINCT`, etc
- Unable to perform lead/lag built in functions on List and Struct data types HOT 1
- Enable `split_file_groups_by_statistics` by default HOT 3
- Avoid inlining non deterministic CTE HOT 2
- Make all SchemaProvider trait APIs async HOT 4
- Document timezone semantics HOT 2
- Schema incorrect after select over aggregate function that returns a different type than the input HOT 5
- clippy failure in main HOT 1
- Document Sort Merge Join algorithm HOT 4
- `LogFunc` simplifier swaps the order of arguments
- Standardize the separator in name HOT 1
- Onyl recompute schema in `TypeCoercion` when necessary
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.