Comments (8)
I think it should be caused by too many connections when the partition is too large. I also found a lot of connection creation process from the log as follows
from incubator-celeborn.
If the shuffle.partitions can be reduced, this problem can be solved, but in complex tasks, both large and small tasks exist. If the setting is too small, it is not very suitable for large tasks
from incubator-celeborn.
Does increasing celeborn.network.timeout
help? The default value is 240s.
https://celeborn.apache.org/docs/0.2.1-incubating/configuration/#network
Note: you should set it in Spark configuration with the additional prefix spark.
, then it should be spark.celeborn.network.timeout
from incubator-celeborn.
Is it caused by not disabling spark.sql.adaptive.localShuffleReader.enabled
?
from incubator-celeborn.
Is it caused by not disabling
spark.sql.adaptive.localShuffleReader.enabled
?
@wForget Right, @thomasg19930417 could you turn off spark.sql.adaptive.localShuffleReader.enabled
and test again?
from incubator-celeborn.
After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture
from incubator-celeborn.
After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture
We do added comment about this config, maybe we should highlight it more :)
from incubator-celeborn.
After testing, it is indeed caused by this parameter, thank you for your reply @pan3793 @wForget @waitinfuture
We do added comment about this config, maybe we should highlight it more :)
This parameter was commented out by my mistake, maybe it was automatically commented out when the shell was copied.
from incubator-celeborn.
Related Issues (20)
- [FEATURE] support create multiple celeborn clusters(for flink) in one kubernetes namespace HOT 1
- [BUG] CelebornIOException: createPartitionReader failed! HOT 3
- [FEATURE] introduce jemalloc to optimize memory usage HOT 7
- [FEATURE] HDFS storage support NameNode HA config HOT 1
- [DOC] Update Configuration for spark.shuffle.sort.io.plugin.class HOT 1
- [BUG] Make volume name dynamic in statefulset in helm chart HOT 3
- [Suggestion] Startup master/worker listen on 0.0.0.0 by default HOT 3
- [FEATURE] support configurable checksum in Lz4Decompressor HOT 4
- [FEATURE] make affinity.master and affinity.worker optional HOT 2
- [BUG] master cannot startup HOT 1
- [FEATURE] Set resources ( cpu/memory requests and limits) for initContainers for Helm chart HOT 2
- [QUESTION] spark reduce需要sort时,是否还需要准备很大的本地盘 HOT 4
- Who is using Apache Celeborn? HOT 19
- [DOC] Disable dynamic allocation shuffle tracking in spark 3.5 HOT 2
- [BUG] spark procedure occur Shuffle data lost for shuffle xx partitionId xx frequently HOT 1
- Please add tag for 0.4.0 version HOT 3
- [Umbrella] Apache Celeborn Graduation Tasks
- Performance is not improved for too complex query
- [FEATURE] why celeborn not support flink 1.16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-celeborn.