Comments (3)
Hi, can you share your disk metrics like disk bandwidth or "%iowait" data?
I think this means you need more disks, can you try to add more disks to "rss.worker.base.dirs" ?
@packageman
from incubator-celeborn.
The IO metrics command I am using is iostat -x 2
-
IO metrics when writing files using FileChannel(ESSD 100G PL1 disk):
-
IO metrics when writing files using DataOutputStream(ESSD 100G PL1 disk):
When using FileChannel, %iowait
is almost 100%, and when using DataOutputStream, only 32% max, and it takes less time to complete the writing of DataOutputStream.
I upgraded the disk to ESSD 500G PL2 and re-tested the FileChannel, and the IO metrics are as follows:
%iowait
has dropped to about %20, w/s
and wKB/s
have been improved, and the writing time has been shortened a lot.
Through the above tests, it is found that the writing speed can be improved by improving disk performance or changing FileChannel to DataOutputStream(which may bring additional CPU and memory overhead for data copying).
from incubator-celeborn.
Thanks for your concentration. You are always welcome to discuss your idea here.
I think you might misunderstand the meaning of %iowait.
IOWait (usually labeled %wa in top) is a sub-category of idle (%idle is usually expressed as all idle except defined subcategories), meaning the CPU is not doing anything. Therefore, as long as there is another process that the CPU could be processing, it will do so. Additionally, idle, user, system, iowait, etc are measurements for the CPU. In other words, you can think of iowait as the idle caused by waiting for io.
So high iowait value means efficient CPU usage and, higher disk performance means lower iowait.
FileChannel vs DataOutputStream, there is some existing reference.
http://web.archive.org/web/20120815094827/http://geekomatic.ch/2008/09.html
http://web.archive.org/web/20120815094827/http://geekomatic.ch/2008/09.html
from incubator-celeborn.
Related Issues (20)
- [BUG] Syntax error in helm charts file: prometheus-podmonitor.yaml HOT 1
- Dynamic allocation of executors requires the external shuffle service HOT 2
- Dependency org.yaml:snakeyaml, leading to CVE problem
- [BUG] Relax isRssEnabled condition to compatible with gluten celeborn shuffle manager
- [FEATURE] support tez client HOT 1
- [FEATURE] In soft mode, there may be situations where individual partition files are exceptionally large
- [BUG] Shuffle read latency is too high when automatic Broadcastjoin is triggered HOT 8
- [FEATURE] support create multiple celeborn clusters(for flink) in one kubernetes namespace HOT 1
- [BUG] CelebornIOException: createPartitionReader failed! HOT 3
- [FEATURE] introduce jemalloc to optimize memory usage HOT 7
- [FEATURE] HDFS storage support NameNode HA config HOT 1
- [DOC] Update Configuration for spark.shuffle.sort.io.plugin.class HOT 1
- [BUG] Make volume name dynamic in statefulset in helm chart HOT 3
- [Suggestion] Startup master/worker listen on 0.0.0.0 by default HOT 3
- [FEATURE] support configurable checksum in Lz4Decompressor HOT 4
- [FEATURE] make affinity.master and affinity.worker optional HOT 2
- [BUG] master cannot startup HOT 1
- [FEATURE] Set resources ( cpu/memory requests and limits) for initContainers for Helm chart HOT 2
- [QUESTION] spark reduce需要sort时,是否还需要准备很大的本地盘 HOT 4
- Who is using Apache Celeborn? HOT 23
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-celeborn.