Comments (31)
Can you share the spark UI for stage IO?
from firestorm.
I can't upload pictures in my company
from firestorm.
Compared to the native Spark, Shuffle Write has the same amount of data, but Firestorm reads very little data during Shuffle Read. The label Task:Succeeded/Total
in spark ui shows only one Task in Firestorm,but Spark shows 5000 tasks are successfully executed.
from firestorm.
How about the result? Is it the same as the result with native Spark?
we passed result compare based on 1TB data, but haven't did this with 10TB data.
from firestorm.
from firestorm.
from firestorm.
How about the result? Is it the same as the result with native Spark? we passed result compare based on 1TB data, but haven't did this with 10TB data.
I need to confirm this, because we modified the SQL and did not collect the results
from firestorm.
Does Firestorm print partition lengths to MapStatus?
from firestorm.
We record the length, aqe need the metrics.
from firestorm.
from firestorm.
Could you give me more detail information?
from firestorm.
Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.
from firestorm.
However spark2 do support this configuration spark.sql.adaptive.enabled
. If as mentioned above,then spark.sql.adaptive.enabled
can't be set to true?
from firestorm.
Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.
spark2 don't support AQE.
from firestorm.
The open source Spark2 don't support AQE, too.
from firestorm.
Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.
spark2 don't support AQE.
But if I set spark.sql.adaptive.enabled=true
,I will get the wrong result.
from firestorm.
https://spark.apache.org/releases/spark-release-3-0-0.html
AQE is the Spark 3.0's feature.
from firestorm.
As far as I know, spark2 can also use configuration spark.sql.adaptive.enabled
.
from firestorm.
Then ExchangeCoordinator.doEstimationIfNecessary() method will need mapOutputStatistics to determine the number of post-shuffle partitions.
from firestorm.
@xunxunmimi5577 For RSS + Spark2, AQE is not supported with current implementation. This feature was announced in Spark3, so there is no plan to support AQE with Spark2.
from firestorm.
It's not available feature in Spark2. Maybe some configurations were added first , but the implement isn't complete.
from firestorm.
If I use spark2 + firestorm + spark.sql.adaptive.enabled=true
, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed.
I think users should at least be prompted of this.
from firestorm.
If I use spark2 + firestorm +
spark.sql.adaptive.enabled=true
, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.
OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?
from firestorm.
@xunxunmimi5577 thanks for report this, I think it should be described in readme for such unsupported case.
from firestorm.
If I use spark2 + firestorm +
spark.sql.adaptive.enabled=true
, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?
I would like to, or maybe you just want to describe it in readme?
from firestorm.
Moreover, is it possible to record an array of partitionLengths like Spark3?
from firestorm.
If I use spark2 + firestorm +
spark.sql.adaptive.enabled=true
, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0,this is the phenomenon described in my issue,there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?
I would like to, or maybe you just want to describe it in readme?
Actually, we want to do two things. We want to add the parameter check in code. And we also want to increase document description.
from firestorm.
Moreover, is it possible to record an array of partitionLengths like Spark3?
It's not available Feature in Spark 2. We wouldn't do it.
from firestorm.
OK
from firestorm.
Could I close this issue? Is it solved?
from firestorm.
I think it's solved.Let me close this issue.
from firestorm.
Related Issues (20)
- Whether multiple disks are supported for local storage? HOT 4
- duplicate servlets map in Coordinator Server
- 使用firestorm-0.4.0 运行spark3.1.1官方的JavaWordCount报如下错误,并且在yarn-client模式下driver端进程一直不退出 HOT 10
- What‘s the difference between `spark.rss.storage.type` and `rss.storage.type`? HOT 18
- yarn-client模式下driver端进程一直不退出 HOT 9
- In local mode, why directory should be deleted first? HOT 1
- [QUESTION] 依赖Hadoop环境? HOT 3
- [QUESTION] Executor在shuffle write/read 过程中是否落本地盘? HOT 2
- [Feature Request]Add a web UI in Coordinated Server to show the detailed server/job/metrics information HOT 1
- hardcoded relative paths HOT 6
- Whether local multiple replicas are supported? HOT 2
- Compared to the native spark, the shuffle write data of firestorm is always smaller HOT 2
- Unexpected crc value for blockId[474989042101783], expected:1518107711, actual:3331113690 HOT 5
- Support shuffle data replica? HOT 5
- Coordinator HA problem HOT 6
- fault tolerance HOT 4
- Clear buffered data when acquiring memory failed and then retry
- To support more tasks with Firestorm
- how to enter into uniffle wechat or dingtalk?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firestorm.