Git Product home page Git Product logo

Comments (7)

tustvold avatar tustvold commented on May 16, 2024 1

datafusion writes something like ^4.0.0 on its cargo.toml

Yes, this would be my ideal for the reasons you articulated.

we release arrow more often and strictly according to semver

This would definitely be something I'd be supportive of, but possibly somewhat tangential to making DataFusion as a library easier to consume. Cargo has a good story for overriding dependencies within a workspace, including indirect dependencies, provided those versions aren't pinned within the libraries. Therefore if DataFusion were to move to a released version of arrow it wouldn't preclude users from opting-in to newer, potentially unreleased versions of arrow.

However, DataFusion itself would not be able to opt-in to unreleased arrow functionality, and so if there are frequent DataFusion changes coupled with arrow changes, then yes a more frequent arrow release cycle would possibly be a pre-condition of moving to using a released version of arrow.

I am not sure whether we could get away with the multiple paths approach, e.g.

I've not come across this approach, I'd worry that it might be vulnerable to rust-lang/cargo#5478 which would prevent users from opting into newer versions of arrow within their workspaces, which imo would be unfortunate

from arrow-datafusion.

andygrove avatar andygrove commented on May 16, 2024

@tustvold Does #39 help with this?

from arrow-datafusion.

tustvold avatar tustvold commented on May 16, 2024

@tustvold Does #39 help with this?

It helps, but doesn't solve the underlying issue. If you depend on another crate that in turn depends on arrow and isn't exposed by DataFusion, e.g. arrow-flight, or you want to set different features from what DataFusion sets, you end up having to replicate the exact version pins from DataFusion into all other crates

from arrow-datafusion.

jorgecarleitao avatar jorgecarleitao commented on May 16, 2024

I think that the general problem is that we pin arrow (and many others) in datafusion; datafusion is a library and it should thus avoid pinning dependencies.

Instead, it should bracket them, via e.g. ^3.0.0, so that consumers of the library may use a different version of any of its dependencies, for as long as they are compatible, and have cargo find a valid dependency version between what the consumer wants and what datafusion requires.

As it stands, consumers must use the exact same version of arrow that datafusion uses or cargo will pick two different arrow versions. This happens because Cargo cannot guarantee that the two different arrow versions (what datafusion demands and what the consumer wants) are ABI compatible. Consumers can't pass structs from a version of arrow (that they use) to another version of arrow (that datafusion uses).

Note that in this context a different feature set corresponds to a different version, as cargo has no way of knowing whether a feature will retain ABI compatibility.

So, I think the ask here is:

  1. datafusion writes something like ^4.0.0 on its cargo.toml
  2. we release arrow more often and strictly according to semver

Is this the idea, @tustvold ?

I am not sure whether we could get away with the multiple paths approach, e.g.

arrow = { git = "https://github.com/arrow-rs/arrow", version = "^3.0.0" }

from arrow-datafusion.

jorgecarleitao avatar jorgecarleitao commented on May 16, 2024

but possibly somewhat tangential to making DataFusion as a library easier to consume

I agree.

My point is that the reason we use pinned hashes of arrow is so that we do not have to wait for a new release. So, I think that to stop pinning in DataFusion, we need to release arrow more frequently. But I agree that from the consumers' point of view, it is not needed, as you can just point to a hash in arrow-rs ^_^

from arrow-datafusion.

Jimexist avatar Jimexist commented on May 16, 2024

@alamb this can be closed now

from arrow-datafusion.

alamb avatar alamb commented on May 16, 2024

Indeed -- thanks @Jimexist -- this issue was closed in #393 I think

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.