Git Product home page Git Product logo

Comments (3)

LDAmorim avatar LDAmorim commented on August 16, 2024

The timing and tiny profilers information for the performance tests done on GPU and CPUs, with and without a level 1 mesh refined patch, using a 3D boosted-frame uniform plasma input file are shown in the attached file.

PerformanceUPlasma.txt

For the run with 2 particles per cell (ppc) with a mesh patch, running on multiple GPUs, the functions RedistributeCPU(), RedistributeMPI() and PPC::Evolve::partition() become costly.

Double GPU WithMesh 2ppc
------------------------------------------------------------------------------------------------
Walltime = 2.179316111 s; This step = 2.1793155 s; Avg. per step = 2.179316111 s
Total Time                     : 129.2838885
Total GPU global memory (MB) spread across MPI: [16128 ... 16128]
Free  GPU global memory (MB) spread across MPI: [15034 ... 15238]
PPC::Evolve::partition                         100      62.64      62.64      62.64  29.05%
WarpX::EvolveB()                               200      13.47      13.47      13.47   6.25%
WarpX::EvolveEM()                                1       5.52      5.617      5.713   4.42%
WarpX::EvolveEM()                                1        120        120        120  92.81%
WarpX::EvolveB()                               200      7.639      7.784      7.928   6.13%
WarpX::EvolveB()                               200      7.639      7.784      7.928   6.13%
ParticleContainer::RedistributeCPU()           103      1.805      34.64      67.48  52.20%
ParticleContainer::RedistributeCPU()           103      67.58      68.04       68.5  52.98%
ParticleContainer::RedistributeMPI()           103     0.0177      33.13      66.24  51.24%
RedistributeMPI_copy                            93    0.06094     0.1893     0.3178   0.25%
PPC::Evolve::partition                         100      34.37      34.52      34.67  26.82%
PPC::Evolve::partition                         100      34.37      34.52      34.67  26.82%

Below is the summary of the results for the same run running on 2 CPUs (note that this is not a fair 1-1 comparison of performance as GPU and CPUs have a different nr of cores).

Double CPU WithMesh 2ppc
------------------------------------------------------------------------------------------------
Walltime = 4.918736936 s; This step = 4.918736889 s; Avg. per step = 4.918736936 s
Total Time                     : 186.5106439
PPC::Evolve::Accumulate                      19612      62.94      62.94      62.94  26.42%
WarpX::EvolveB()                               200      19.23      19.23      19.23   8.07%
WarpX::EvolveEM()                                1      4.171      4.229      4.288   2.30%
WarpX::EvolveEM()                                1      185.1      185.1      185.1  99.24%
WarpX::EvolveB()                               200       9.79      9.892      9.994   5.36%
WarpX::EvolveB()                               200       9.79      9.892      9.994   5.36%
ParticleContainer::RedistributeCPU()           103      2.515      13.12      23.74  12.73%
ParticleContainer::RedistributeCPU()           103      23.77      24.13      24.49  13.13%
ParticleContainer::RedistributeMPI()           103    0.01332      10.89      21.76  11.67%
RedistributeMPI_locate                          93    0.01608    0.07935     0.1426   0.08%
PPC::Evolve::partition                        1592    0.05789     0.0598     0.0617   0.03%
PPC::Evolve::partition                        1592    0.05789     0.0598     0.0617   0.03%


For that reason we will try to optimize those functions before running the large milestone conversion scans on Summit.

I am currently trying to understand the profiler output for that simulation (with mesh, running on 2 GPUs with 2 ppc) for which I attach the input, submit job and output files.

test-nprofile.18198.txt
test-nvprof.txt
submitjob.txt
inputs-uplasma-boost.3d.txt

from warpx.

LDAmorim avatar LDAmorim commented on August 16, 2024

Hi @MaxThevenet ,
I think this issue is no longer relevant (and so maybe it can be closed), because Andrew fixed the re-distribute functions.
What do you think?
Thanks,
Diana

from warpx.

MaxThevenet avatar MaxThevenet commented on August 16, 2024

Agreed, thanks!

from warpx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.