Comments (3)
The timing and tiny profilers information for the performance tests done on GPU and CPUs, with and without a level 1 mesh refined patch, using a 3D boosted-frame uniform plasma input file are shown in the attached file.
For the run with 2 particles per cell (ppc) with a mesh patch, running on multiple GPUs, the functions RedistributeCPU(), RedistributeMPI() and PPC::Evolve::partition() become costly.
Double GPU WithMesh 2ppc
------------------------------------------------------------------------------------------------
Walltime = 2.179316111 s; This step = 2.1793155 s; Avg. per step = 2.179316111 s
Total Time : 129.2838885
Total GPU global memory (MB) spread across MPI: [16128 ... 16128]
Free GPU global memory (MB) spread across MPI: [15034 ... 15238]
PPC::Evolve::partition 100 62.64 62.64 62.64 29.05%
WarpX::EvolveB() 200 13.47 13.47 13.47 6.25%
WarpX::EvolveEM() 1 5.52 5.617 5.713 4.42%
WarpX::EvolveEM() 1 120 120 120 92.81%
WarpX::EvolveB() 200 7.639 7.784 7.928 6.13%
WarpX::EvolveB() 200 7.639 7.784 7.928 6.13%
ParticleContainer::RedistributeCPU() 103 1.805 34.64 67.48 52.20%
ParticleContainer::RedistributeCPU() 103 67.58 68.04 68.5 52.98%
ParticleContainer::RedistributeMPI() 103 0.0177 33.13 66.24 51.24%
RedistributeMPI_copy 93 0.06094 0.1893 0.3178 0.25%
PPC::Evolve::partition 100 34.37 34.52 34.67 26.82%
PPC::Evolve::partition 100 34.37 34.52 34.67 26.82%
Below is the summary of the results for the same run running on 2 CPUs (note that this is not a fair 1-1 comparison of performance as GPU and CPUs have a different nr of cores).
Double CPU WithMesh 2ppc
------------------------------------------------------------------------------------------------
Walltime = 4.918736936 s; This step = 4.918736889 s; Avg. per step = 4.918736936 s
Total Time : 186.5106439
PPC::Evolve::Accumulate 19612 62.94 62.94 62.94 26.42%
WarpX::EvolveB() 200 19.23 19.23 19.23 8.07%
WarpX::EvolveEM() 1 4.171 4.229 4.288 2.30%
WarpX::EvolveEM() 1 185.1 185.1 185.1 99.24%
WarpX::EvolveB() 200 9.79 9.892 9.994 5.36%
WarpX::EvolveB() 200 9.79 9.892 9.994 5.36%
ParticleContainer::RedistributeCPU() 103 2.515 13.12 23.74 12.73%
ParticleContainer::RedistributeCPU() 103 23.77 24.13 24.49 13.13%
ParticleContainer::RedistributeMPI() 103 0.01332 10.89 21.76 11.67%
RedistributeMPI_locate 93 0.01608 0.07935 0.1426 0.08%
PPC::Evolve::partition 1592 0.05789 0.0598 0.0617 0.03%
PPC::Evolve::partition 1592 0.05789 0.0598 0.0617 0.03%
For that reason we will try to optimize those functions before running the large milestone conversion scans on Summit.
I am currently trying to understand the profiler output for that simulation (with mesh, running on 2 GPUs with 2 ppc) for which I attach the input, submit job and output files.
test-nprofile.18198.txt
test-nvprof.txt
submitjob.txt
inputs-uplasma-boost.3d.txt
from warpx.
Hi @MaxThevenet ,
I think this issue is no longer relevant (and so maybe it can be closed), because Andrew fixed the re-distribute functions.
What do you think?
Thanks,
Diana
from warpx.
Agreed, thanks!
from warpx.
Related Issues (20)
- Managed Memory Leftovers: MR Init, QED, Ionization HOT 4
- Inhomogeneous Neumann on EB? HOT 1
- PEC boundaries: Copy values in guard cells HOT 2
- Please delete the issue HOT 1
- Classical in-situ radiation computation HOT 5
- Laser envelope model HOT 1
- Make `add initial fields on grid` work for RZ HOT 2
- Legacy TXYE files no longer supported HOT 3
- Remove Harris laser?
- WarpX on new HPC cluster at UCI HOT 34
- Alternative method of particle initialisation from a file HOT 4
- hdf5 files HOT 22
- Spectral solver with PML Boundaries and moving window HOT 3
- running on multiGPU machine HOT 44
- Problem of checkpoint and restart HOT 3
- 1D laser behavior not understood HOT 3
- Singularity Images HOT 3
- Creating a `WarpX` Profile for the Karolina HPC System HOT 3
- Laser does not move when using "parse_field_function" HOT 1
- AnalyticDistribution HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from warpx.