Comments (7)
datafusion writes something like ^4.0.0 on its cargo.toml
Yes, this would be my ideal for the reasons you articulated.
we release arrow more often and strictly according to semver
This would definitely be something I'd be supportive of, but possibly somewhat tangential to making DataFusion as a library easier to consume. Cargo has a good story for overriding dependencies within a workspace, including indirect dependencies, provided those versions aren't pinned within the libraries. Therefore if DataFusion were to move to a released version of arrow it wouldn't preclude users from opting-in to newer, potentially unreleased versions of arrow.
However, DataFusion itself would not be able to opt-in to unreleased arrow functionality, and so if there are frequent DataFusion changes coupled with arrow changes, then yes a more frequent arrow release cycle would possibly be a pre-condition of moving to using a released version of arrow.
I am not sure whether we could get away with the multiple paths approach, e.g.
I've not come across this approach, I'd worry that it might be vulnerable to rust-lang/cargo#5478 which would prevent users from opting into newer versions of arrow within their workspaces, which imo would be unfortunate
from arrow-datafusion.
@tustvold Does #39 help with this?
from arrow-datafusion.
It helps, but doesn't solve the underlying issue. If you depend on another crate that in turn depends on arrow and isn't exposed by DataFusion, e.g. arrow-flight, or you want to set different features from what DataFusion sets, you end up having to replicate the exact version pins from DataFusion into all other crates
from arrow-datafusion.
I think that the general problem is that we pin arrow (and many others) in datafusion; datafusion is a library and it should thus avoid pinning dependencies.
Instead, it should bracket them, via e.g. ^3.0.0
, so that consumers of the library may use a different version of any of its dependencies, for as long as they are compatible, and have cargo find a valid dependency version between what the consumer wants and what datafusion requires.
As it stands, consumers must use the exact same version of arrow that datafusion uses or cargo will pick two different arrow versions. This happens because Cargo cannot guarantee that the two different arrow versions (what datafusion demands and what the consumer wants) are ABI compatible. Consumers can't pass structs from a version of arrow (that they use) to another version of arrow (that datafusion uses).
Note that in this context a different feature set corresponds to a different version, as cargo has no way of knowing whether a feature will retain ABI compatibility.
So, I think the ask here is:
- datafusion writes something like
^4.0.0
on itscargo.toml
- we release arrow more often and strictly according to semver
Is this the idea, @tustvold ?
I am not sure whether we could get away with the multiple paths approach, e.g.
arrow = { git = "https://github.com/arrow-rs/arrow", version = "^3.0.0" }
from arrow-datafusion.
but possibly somewhat tangential to making DataFusion as a library easier to consume
I agree.
My point is that the reason we use pinned hashes of arrow is so that we do not have to wait for a new release. So, I think that to stop pinning in DataFusion, we need to release arrow more frequently. But I agree that from the consumers' point of view, it is not needed, as you can just point to a hash in arrow-rs
^_^
from arrow-datafusion.
@alamb this can be closed now
from arrow-datafusion.
Indeed -- thanks @Jimexist -- this issue was closed in #393 I think
from arrow-datafusion.
Related Issues (20)
- `LogFunc` simplifier swaps the order of arguments
- Standardize the separator in name HOT 1
- Onyl recompute schema in `TypeCoercion` when necessary
- Better timezone functionalities HOT 3
- Auto-update mechanism for dataframe test HOT 1
- Remove `Expr::GetIndexedField` and `GetFieldAccess` and always use function `get_field` for indexing HOT 2
- Support user defined display for UDF HOT 2
- Remove DataPtr trait and use Arc::ptr_eq directly
- Sort Merge Join. LeftSemi issues
- Sort Merge Join. LeftAnti issues
- Port aggregate test to sqllogictest
- Move `Covariance` (Population) `covar_pop` to be a UDAF HOT 1
- For DML plans, `LogicalPlan::schema` returns the input schema instead of output schema
- DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 HOT 3
- Reduce repetition in datafusion::functions using macros HOT 1
- Support custom SchemaAdapter on ParquetExec HOT 2
- Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` HOT 3
- Make ASF public press release HOT 1
- Substrait integration doesn't recognize typed functions HOT 2
- Incorrect results with expression resolution HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.