Git Product home page Git Product logo

Comments (31)

colinmjj avatar colinmjj commented on April 28, 2024

Can you share the spark UI for stage IO?

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

I can't upload pictures in my company

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Compared to the native Spark, Shuffle Write has the same amount of data, but Firestorm reads very little data during Shuffle Read. The label Task:Succeeded/Total in spark ui shows only one Task in Firestorm,but Spark shows 5000 tasks are successfully executed.

from firestorm.

colinmjj avatar colinmjj commented on April 28, 2024

How about the result? Is it the same as the result with native Spark?
we passed result compare based on 1TB data, but haven't did this with 10TB data.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Uploading IMG_20220615_092133.jpg…

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Uploading IMG_20220615_092112.jpg…

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

How about the result? Is it the same as the result with native Spark? we passed result compare based on 1TB data, but haven't did this with 10TB data.

I need to confirm this, because we modified the SQL and did not collect the results

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Does Firestorm print partition lengths to MapStatus?

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

We record the length, aqe need the metrics.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

Could you give me more detail information?

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

However spark2 do support this configuration spark.sql.adaptive.enabled. If as mentioned above,then spark.sql.adaptive.enabled can't be set to true?

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

spark2 don't support AQE.

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

The open source Spark2 don't support AQE, too.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

spark2 don't support AQE.

But if I set spark.sql.adaptive.enabled=true,I will get the wrong result.

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

https://spark.apache.org/releases/spark-release-3-0-0.html
AQE is the Spark 3.0's feature.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

As far as I know, spark2 can also use configuration spark.sql.adaptive.enabled.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Then ExchangeCoordinator.doEstimationIfNecessary() method will need mapOutputStatistics to determine the number of post-shuffle partitions.

from firestorm.

colinmjj avatar colinmjj commented on April 28, 2024

@xunxunmimi5577 For RSS + Spark2, AQE is not supported with current implementation. This feature was announced in Spark3, so there is no plan to support AQE with Spark2.

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

It's not available feature in Spark2. Maybe some configurations were added first , but the implement isn't complete.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed.
I think users should at least be prompted of this.

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

from firestorm.

colinmjj avatar colinmjj commented on April 28, 2024

@xunxunmimi5577 thanks for report this, I think it should be described in readme for such unsupported case.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

I would like to, or maybe you just want to describe it in readme?

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

Moreover, is it possible to record an array of partitionLengths like Spark3?

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

I would like to, or maybe you just want to describe it in readme?

Actually, we want to do two things. We want to add the parameter check in code. And we also want to increase document description.

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

Moreover, is it possible to record an array of partitionLengths like Spark3?

It's not available Feature in Spark 2. We wouldn't do it.

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

OK

from firestorm.

jerqi avatar jerqi commented on April 28, 2024

Could I close this issue? Is it solved?

from firestorm.

xunxunmimi5577 avatar xunxunmimi5577 commented on April 28, 2024

I think it's solved.Let me close this issue.

from firestorm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.