Git Product home page Git Product logo

Comments (7)

Blizzara avatar Blizzara commented on August 26, 2024 1

Cool, I'll take a look in the next days!

from arrow-datafusion.

waynexia avatar waynexia commented on August 26, 2024 1

Sounds great!! I'm very interested in this project, looking forward to your progress!

from arrow-datafusion.

alamb avatar alamb commented on August 26, 2024

Thanks for the report @Blizzara -- this would be a great thing to fix

from arrow-datafusion.

waynexia avatar waynexia commented on August 26, 2024

Thanks for taking this @Blizzara! Just out of curiosity, are you using this datafusion-substrait somewhere and find this inconsistency?

from arrow-datafusion.

Blizzara avatar Blizzara commented on August 26, 2024

Yes! I'm working on using DataFusion to basically execute Spark dataframes through Spark -> Substrait -> DataFusion. The Spark -> Substrait part is a (currently closed-source, but I hope to open it too) fork of "substrait-spark" from https://github.com/apache/incubator-gluten/tree/v1.1.0/substrait/substrait-spark (forked as it's no longer included in gluten).

from arrow-datafusion.

EpsilonPrime avatar EpsilonPrime commented on August 26, 2024

Be wary, Gluten contains a copy of Substrait instead of depending on the main repo. As a result its Substrait is incompatible with the rest of the ecosystem. That works for them as they only use the Substrait internally but the other tools are handy sometimes (especially the Substrait Validator).

It may also interest you that a generic Spark to Substrait tool is out there: https://github.com/voltrondata/spark-substrait-gateway

It's mostly there for DuckDB but other backends require some tweaks (DataFusion is one of them). We have a tweak to remove the compound names for DataFusion but would love to see this issue addressed. If you have questions (especially from the Substrait side) feel free to reach out.

from arrow-datafusion.

Blizzara avatar Blizzara commented on August 26, 2024

Be wary, Gluten contains a copy of Substrait instead of depending on the main repo. As a result its Substrait is incompatible with the rest of the ecosystem. That works for them as they only use the Substrait internally but the other tools are handy sometimes (especially the Substrait Validator).

Yup, the substrait-spark submodule was not using Gluten's Substrait but the vanilla one. (The whole thing wasn't really used by Gluten and it's no longer part of the repo, thus the fork).

It may also interest you that a generic Spark to Substrait tool is out there: https://github.com/voltrondata/spark-substrait-gateway

It does, thanks! Not directly useful as we need a solution in Java, but it's always good to have more references to look at.

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.