Hi guys, as suggested by <a class="user-mention notranslate" data-hovercard-type="user

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Well, I made a mistake I guess, this may be better. <div class="snippet-clipboard-

Need help on doing a benchmark run using WarpX about warpx HOT 8 CLOSED

Yin-YinjianZhao commented on August 17, 2024

Need help on doing a benchmark run using WarpX

from warpx.

Comments (8)

Yin-YinjianZhao commented on August 17, 2024

Currently, I'm trying to use this piece of code to do the job, but I'm a little bit confused about the indexing of phi and Ex. The simulation has nx=500, phi has length along x 503, I would say it contains guard grid points, nx=500 is 500 cells, which has 0,1,2,...,500, thus 501 grid points. Plus two guard grid points, equals 503. But Ex seems to have more than 503 length, so I'm confused.

def phi_fix():
    phi = fields.PhiFPWrapper(0, True)[...]
    Ex = fields.ExFPWrapper(0, True)[...]
    phi_ave = np.mean(phi[i_xin][:])
    for i in range(len(phi)-2):
        x = i*dx
        phi[i+1][:] = phi[i+1][:] - x/xin*phi_ave
    for i in range(len(phi)-2):
        Ex[i+1][2:nz+2] = -(phi[i+2][1:nz+1]-phi[i][1:nz+1])/(dx*2)
    fields.PhiFPWrapper(0, True)[...] = phi 
    fields.ExFPWrapper(0, True)[...] = Ex

callbacks.installafterEsolve(phi_fix)

from warpx.

Yin-YinjianZhao commented on August 17, 2024

Well, because I don't quite know if the above is doing correctly, (noticed I was using staggered grid, but switched to collocated grid now), and the simulation speed drops a lot, so I hard coded the following for now in ElectrostaticSolver.cpp.

#elif defined(WARPX_DIM_XZ) || defined(WARPX_DIM_RZ)

                int rank,nprocs,n,nc,ns;
                MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
                MPI_Comm_rank(MPI_COMM_WORLD,&rank);
                n = 256/nprocs;
                nc = 0;
                Real phi0 = 0._rt, phis = 0._rt;
                for (int i=rank*n; i<=(rank+1)*n; i++) {
                    phi0 = phi0 + phi_arr(480,i,0);
                    nc = nc + 1;
                };
                MPI_Allreduce(&phi0, &phis, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
                MPI_Allreduce(&nc, &ns, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
                phis = phis / double(ns);

                amrex::ParallelFor( tbx, tby, tbz,
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ex_arr(i,j,k) +=
                            +(beta_x*beta_x-1._rt)*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            + beta_x*beta_z       *0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ey_arr(i,j,k) +=
                            +beta_x*beta_y*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            +beta_y*beta_z*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) {
                        Real f = i*dx[0]/2.4e-2*phis*2.0;
                        Ez_arr(i,j,k) +=
                            + beta_z*beta_x       *0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)-f)
                            +(beta_z*beta_z-1._rt)*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k)-f);
                    }
                );

from warpx.

RemiLehe commented on August 17, 2024

Sorry for the late reply! Here are a few answers to your questions:

Regarding the update of E: as you indicated in your second message, you can simply update E directly in afterEsolve.
Regarding the number of points of phi and E: the difference is because E typically has more guard cells than phi. This is because phi requires only one guard cell (for the parallel Poisson solver), while E needs to have enough guard cells for particle with a wide shape factor to gather values from the guard cells (if they are close to a boundary between sub-domain). You can try to pass include_ghosts=False in ExWrapper and PhiWrapper to avoid this problem.
As a side-note, using explicit for loops in Python code like this:

    for i in range(len(phi)-2):
        x = i*dx
        phi[i+1][:] = phi[i+1][:] - x/xin*phi_ave
    for i in range(len(phi)-2):
        Ex[i+1][2:nz+2] = -(phi[i+2][1:nz+1]-phi[i][1:nz+1])/(dx*2)

is very inefficient. It is much more efficient to use numpy array-wise operations, e.g.

x = dx*np.arange(len(phi)-2)
phi[1:-1, :] += phi[1:-1,:] - x/xin*phi_ave
Ex[1:-1,2:-2] += -( phi[2:,1:-1]-phi[:-2,1:-1] )/(2*dx)

from warpx.

Yin-YinjianZhao commented on August 17, 2024

@RemiLehe Hi Remi, thanks for your detailed reply!

I have a few more questions.

Within the above defined function phi_fix(), when I use domain decomposition, each MPI rank will only handle part of the whole phi array, right? But when I print(len(phi)) and print(len(phi[0][:])), I got all the same full length from all the ranks, why is that? In other words, does your suggested code still work correctly if I turn on domain decomposition by setting say numprocs=[1,2]?
My comment above yours, which hard coded that piece of code in ElectrostaticSolver.cpp, will that does the job I want correctly? And will it work correctly for whatever parallelization?
If I need to give correct values to the guard cells?

from warpx.

Yin-YinjianZhao commented on August 17, 2024

Well, I made a mistake I guess, this may be better.

#elif defined(WARPX_DIM_XZ) || defined(WARPX_DIM_RZ)

                int rank,nprocs,n,nc,ns;
                MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
                MPI_Comm_rank(MPI_COMM_WORLD,&rank);
                n = 256/nprocs;
                nc = 0; 
                Real phi0 = 0._rt, phis = 0._rt;
                for (int i=rank*n; i<=(rank+1)*n; i++) {
                    phi0 = phi0 + phi_arr(480,i,0);
                    nc = nc + 1; 
                };
                MPI_Allreduce(&phi0, &phis, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
                MPI_Allreduce(&nc, &ns, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
                phis = phis / double(ns);

                const Box& tb  = mfi.tilebox( phi[lev]->ixType().toIntVect() );
                amrex::ParallelFor( tb,
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        phi_arr(i,j,k) = phi_arr(i,j,k) - i*dx[0]/2.4e-2*phis;
                    } // loop ijk
                );

                amrex::ParallelFor( tbx, tby, tbz, 
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ex_arr(i,j,k) +=
                            +(beta_x*beta_x-1._rt)*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            + beta_x*beta_z       *0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ey_arr(i,j,k) +=
                            +beta_x*beta_y*0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            +beta_y*beta_z*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    },
                    [=] AMREX_GPU_DEVICE (int i, int j, int k) { 
                        Ez_arr(i,j,k) +=
                            + beta_z*beta_x       *0.5_rt*inv_dx*(phi_arr(i+1,j  ,k)-phi_arr(i-1,j  ,k)) 
                            +(beta_z*beta_z-1._rt)*0.5_rt*inv_dz*(phi_arr(i  ,j+1,k)-phi_arr(i  ,j-1,k));
                    }
                );

from warpx.

dpgrote commented on August 17, 2024

Hi Yin - For you question 1, when you do phi = fields.PhiFPWrapper(0, True)[...], the phi that is returned is a full sized array covering the full domain and is broadcast to all processors. This will work fine, but may not be efficient since every processor will be doing the same operation on the whole array.

If you want to do the operation locally on each processor, you would need to use the Python equivalent of a loop over MFIters. For example:

phi = fields.PhiFPWrapper()
Ex = fields.ExFPWrapper()
for mfi in phi:
    phiarr = phi.mf.array(mfi)
    Exarr = Ex.mf.array(mfi)
    phiarr += ...

from warpx.

Yin-YinjianZhao commented on August 17, 2024

Hi Yin - For you question 1, when you do phi = fields.PhiFPWrapper(0, True)[...], the phi that is returned is a full sized array covering the full domain and is broadcast to all processors. This will work fine, but may not be efficient since every processor will be doing the same operation on the whole array.

If you want to do the operation locally on each processor, you would need to use the Python equivalent of a loop over MFIters. For example:
phi = fields.PhiFPWrapper()
Ex = fields.ExFPWrapper()
for mfi in phi:
    phiarr = phi.mf.array(mfi)
    Exarr = Ex.mf.array(mfi)
    phiarr += ...

I see, thank you so much Dave, I will have a try!

from warpx.

Yin-YinjianZhao commented on August 17, 2024

I will close this issue, because I have had a working scipt already. Thanks for all the help!

from warpx.

Need help on doing a benchmark run using WarpX about warpx HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent