Git Product home page Git Product logo

Comments (7)

Blizzara avatar Blizzara commented on July 18, 2024 1

Hmm not sure how you got the NotImplemented error - maybe somehow running a quite old DataFusion? However with the query

SELECT p1.PS_PARTKEY supp_key, p2.PS_PARTKEY cust_key
FROM
    'partsupp' p1,
    'partsupp' p2

I do get the same error as you originally:

#[tokio::test]
async fn roundtrip_implicit_cross_join() -> Result<()> {
    roundtrip("SELECT p1.a p1_a, p2.a p2_a FROM data p1, data p2").await
}

Error: Plan("Projections require unique expression names but the expression \"data.a\" at position 0 and \"data.a\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.")

This is because Substrait doesn't include aliases neither for tables nor for columns. I'm trying to see if I can add that into Substrait, it'd make these things easier to support: substrait-io/substrait#648

from arrow-datafusion.

richtia avatar richtia commented on July 18, 2024 1

Hmm not sure how you got the NotImplemented error - maybe somehow running a quite old DataFusion? However with the query

Ahhh yea...i was on an older version.

from arrow-datafusion.

Blizzara avatar Blizzara commented on July 18, 2024

same column names with different aliases

Isn't the repro trying to alias different column names (PS_PARTKEY, PS_SUPPKEY) to same alias (K1)? Why would you want to do that? 😅

from arrow-datafusion.

richtia avatar richtia commented on July 18, 2024

same column names with different aliases

Isn't the repro trying to alias different column names (PS_PARTKEY, PS_SUPPKEY) to same alias (K1)? Why would you want to do that? 😅

Ahh...that was my mistake. One of those should be k2. I was trying to get a more simple repro of a much larger query with multiple joins. However...now that I have a more proper query, I am running into a different issue.

This is the query I have now:

SELECT p1.PS_PARTKEY supp_key, p2.PS_PARTKEY cust_key
FROM
    'partsupp' p1,
    'partsupp' p2

And this is the substrait error from that:

DataFusion error: NotImplemented("Unsupported operator: CrossJoin:\\n  SubqueryAlias: p1\\n    TableScan: partsupp projection=[ps_partkey]\\n  SubqueryAlias: p2\\n    TableScan: partsupp projection=[ps_partkey]")')

So the original issue that I was hitting was datafusion trying to run a substrait plan generated from DuckDB. And the error from that is the same error as I put in the description.

from arrow-datafusion.

alamb avatar alamb commented on July 18, 2024

I added to the substrait support epic: #5173

from arrow-datafusion.

EpsilonPrime avatar EpsilonPrime commented on July 18, 2024

Given that names don't matter in Substrait (the final names are provided) is the problem solvable within the Substrait consumer for Datafusion? Shouldn't the consumer be able to rename the columns to whatever it wants?

Stepping further back I wonder if the check is needed at all here -- is it trying to prevent extra work or is it trying to prevent confusion on its part later on? It may be designed for the case where the fields are named the same but are from different sources which isn't happening here. Perhaps the check needs to be made more precise?

from arrow-datafusion.

Blizzara avatar Blizzara commented on July 18, 2024

Given that names don't matter in Substrait (the final names are provided) is the problem solvable within the Substrait consumer for Datafusion?

As discussed on the Substrait ticket, yes it can be solved, but not in a nice way.

Shouldn't the consumer be able to rename the columns to whatever it wants?

It can, however given the user has named the columns/tables in one way in the original plan, it can be quite confusing to the user if the columns/tables are named much differently in the actually executed plan.

Stepping further back I wonder if the check is needed at all here -- is it trying to prevent extra work or is it trying to prevent confusion on its part later on? It may be designed for the case where the fields are named the same but are from different sources which isn't happening here. Perhaps the check needs to be made more precise?

This plan results in a cross join, so the fields do refer to different sources, or same table but different sides of the join, so they are different columns.

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.