Comments (14)
🤔
I believe the core reason for having TableSource
is precisely to avoid the dependencies on SessionState
(so it is possible to plan SQL without having the entire datafusion stack)
from arrow-datafusion.
@alamb I will see if I can try it out in the next few days.
from arrow-datafusion.
@alamb I fully agree, that PR will make what is needed feasible. This will be much easier to do once #11516 is merged, I'll wait until that is done and then work on this again.
from arrow-datafusion.
I think the usecase of passing some sort of state to TableProvider
methods is a good and useful thing
https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html
One way is to add a SessionState
parameter as you suggest to all the methods. Given that it is already passed to scan
, this seems a pretty reasonable change to me. While it would be an API change it would be very mechanical to add.
However, it seems like if we ever are going to break apart the core crate more, we'll have to figure out how to split out SessionContext. We have a related discussion here: #11182
Also related slack coversation in ASF slack: https://the-asf.slack.com/archives/C04RJ0C85UZ/p1719691754005389
from arrow-datafusion.
Thanks @alamb , makes very good sense. We can separate the 2 concerns. The only issue I see is that TableSource
also references supports_filters_pushdown
, that is where I got stuck trying to implement that. If we can figure out how to expose SessionState
to TableSource
this won't be hard to do at all. Let me know if I can help. If we implement this we should also make all args as SessionState
(right now execute
uses TaskContext
, they should be the same).
from arrow-datafusion.
@alamb That's right, and therein lies the conundrum... We would need to solve that somehow, that's why I suggested the restructuring. If that were possible I would be submitting a PR to do this 😄
from arrow-datafusion.
When does the TableSource::supports_filter_pushdown
get called I wonder 🤔 Is it possible just to use TableProvider
from arrow-datafusion.
@alamb Good point, I could thought of that but was not sure. I found this: datafusion/expr/src/logical_plan/plan.rs
which calls TableSource.supports_filters_pushdown
but that is only for Display
of a LogicalPlan
(certainly not critical). Further, the TableSource.supports_filters_pushdown
is @deprecated
, so maybe the best thing is to just remove all TableSource.supports_pushdown_filters
altogether? That would be nice and much easier.
If you like that idea I can experiment in my local clone of the project.
from arrow-datafusion.
Further, the TableSource.supports_filters_pushdown is @deprecated, so maybe the best thing is to just remove all TableSource.supports_pushdown_filters altogether? That would be nice and much easier.
If it is deprecated, I think making a PR to remove the deprecated functions would certainly be a good step forward as it would keep things simpler
from arrow-datafusion.
@alamb That would be great. I do wonder though, the TableSource
is used when the LogicalPlan
invokes supports_filters_pushdown
, correct? I don't know enough to say if that is required, or if it is only in a PhysicalPlan
which is all a custom data source cares about.
from arrow-datafusion.
I am not sure either -- we would have to do some more research (potentially try it, for example) to find out
from arrow-datafusion.
@alamb I had some time and tried a very simple workaround, it would be up to you and the team if it is useful.
I added a single arg to: TableProvider.supports_filters_pushdown
:
fn supports_filters_pushdown(
&self,
task_id: &Option<String>,
filters: &[&Expr],
) -> Result<Vec<TableProviderFilterPushDown>> {
Ok(vec![
TableProviderFilterPushDown::Unsupported;
filters.len()
])
}
This only touched 9 files, and given the task_id
I can access the state I have set up. This was far simpler than adding SessionState
, for my use case that wasn't necessary. So while this is different than other places, it does not add more dependencies on the current SessionState
, which I think is a good thing. Note that this could have been TaskContext
but even that is more than I need, the task_id
alone does the trick.
This avoids TableSource
altogether, but I may need it there too.
However, even this was easy to do the issue is that likely remains is who calls supports_filters_pushdown
in a PhysicalPlan
? It see that LogicalPlan
calls it which puts us back to TableSource
. Do you know the calling structure for this? If we need to add it to TableSource
then the same issue remains, where to carry the TaskContext
and the task_id
so it is provided to TableProvider
implementations (which requires the SessionState
or SessionContext
.
I believe we would need to add task_id
here: LogicalPlan::TableScan(scan)
, which is the TableScan
struct. If the task_id
were supported wherever that was created it might work.
If you or someone on the team can help me solve this part of it then this is a very simple change. At least I understand the issue now.
Let me know what you think.
from arrow-datafusion.
After trying to modify LogicalPlan::TableScan
with the extra field, it quickly balloons into needing to change a lot of files (optimizer, etc.). Since scan
works it seems like supports_filters_pushdown
should be feasible too, but I would need help to sort this out. Let's discuss on slack when you can.
from arrow-datafusion.
After trying to modify
LogicalPlan::TableScan
with the extra field, it quickly balloons into needing to change a lot of files (optimizer, etc.). Sincescan
works it seems likesupports_filters_pushdown
should be feasible too, but I would need help to sort this out. Let's discuss on slack when you can.
I think #11516 looks promising as a way to potentially pass SessionState
to the supports_filters_pushdown
API
from arrow-datafusion.
Related Issues (20)
- Support protobuf serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` HOT 1
- Support substrait serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` HOT 3
- Add config flag to convert `Utf8View`/`BinaryView` --> `Utf8` / `Binary` at output HOT 4
- Add the possibility to work with plan clones directly in `inline_table_scan` HOT 3
- Parquet statistics missing when reading `Utf8` as `Utf8View`
- Panic in `substring()` scalar function (SQLancer) HOT 1
- Fuse operations in `equal_rows_arr`
- External sorting not working for (maybe only for string columns??)
- Builder style API for creating `RuntimeEnv` HOT 3
- Unchecked overflow in integer number addition HOT 1
- List available functions (`SHOW FUNCTIONS`)
- Support `map_keys` function for Map type HOT 1
- Support `map_values` for MAP type HOT 1
- `octet_length()` function not working for StringView columns (SQLancer) HOT 1
- Internal error in `regexp_replace()` for some StringView input (SQLancer) HOT 1
- Unparse TableScan with pushdown projection HOT 1
- not support similar to expr run in physical expr?
- Make `RuntimeEnvBuilder` rather than `RuntimeConfig` HOT 1
- Bug: SQLite unparser does not support date extraction functions
- Error aliasing on double unnest on List[Struct] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.