Describe the bug Optimizing a LogicalPla

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

I agree this would be good to fix -- cc <a class="user-mention notranslate" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Note there is a function that infers placeholder types: <a href="https://docs.rs/dataf

Maybe this was fixed in <a class="issue-link js-issue-link" data-error-text="Failed to

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Optimizing `LogicalPlan` with placeholders fails about arrow-datafusion HOT 9 OPEN

simonvandel commented on June 26, 2024

Optimizing `LogicalPlan` with placeholders fails

from arrow-datafusion.

Comments (9)

appletreeisyellow commented on June 26, 2024 2

I am pretty sure from our (InfluxData)'s perspective, the prepared statement usecase is not much of a priority (especially compared to making planning overall faster), so @appletreeisyellow likely can't spend a lot of time on this issue (though maybe she feels differently).

+1. I'm align with @alamb's comment and leave this issue open for anyone else who wants to improve

from arrow-datafusion.

appletreeisyellow commented on June 26, 2024 1

Yes!

from arrow-datafusion.

alamb commented on June 26, 2024 1

Thank you @simonvandel -- that makes total sense.

FWIW the physical optimizer passes to create an Execution Plan still do non trivial work even after the LogicalPlan is created.

I agree this usecase is reasonable one where running the optimizer takes non trivial time compared to query execution time. In fact this was the original usecase for parameterized queries in OLTP engines where the cost of planning dominated the cost of actually running the query so reusing a prepared statement was an important optimization

The usecase is much less common in classic analytic systems as the query execution time was often so much more than even 10s of ms of planning time.

However, as analytics is pushed everywhere, planning time is more important I think the fact that the DataFusion optimizer is so slow makes this even more pronounced. Ergo I think making planning faster via #5637 is very important

I am pretty sure from our (InfluxData)'s perspective, the prepared statement usecase is not much of a priority (especially compared to making planning overall faster), so @appletreeisyellow likely can't spend a lot of time on this issue (though maybe she feels differently).

Thus, what I suggest is we polish up #9073 which improves the error messages and then we can leave this particular ticket open for anyone else who might have the usecase and wants to improve it

from arrow-datafusion.