Git Product home page Git Product logo

afid_gpu_opensource's Introduction

AFiD_GPU

GPU version of AFID code, see also https://github.com/PhysicsofFluids/AFiD

When using this code please cite

X. Zhu, E. Phillips, V. Spandan, J. Donners, G. Ruetsch, J. Romero, R. Ostilla-Mónico, Y. Yang, D. Lohse, R. Verzicco, M. Fatica, R.J.A.M. Stevens, AFiD-GPU: a versatile Navier-Stokes Solver for Wall-Bounded Turbulent Flows on GPU Clusters, Submitted to Computer Physics Communications, see ArXiv paper https://arxiv.org/abs/1705.01423

for a description of the GPU version of the code and

E.P. van der Poel, R. Ostilla Mónico, J. Donners, R. Verzicco, A pencil distributed finite difference code for strongly turbulent wall-bounded flows, Computers and Fluids, 116, 10-16 (2015), see http://www.sciencedirect.com/science/article/pii/S0045793015001164

for a description about the CPU version of the development of the used numerical method.

Building

  • The code requires the PGI compiler version 15.7 or higher.

  • To compile the code, use the provided Makefile (Cartesius) as reference and Makefile_PizDaint (Piz Daint). Edit the variables at the top of the file for proper linking to external dependencies. Upon successful compilation, three executables will be generated, afid_gpu, afid_cpu, and afid_hyb. These correspond to the GPU version of the code, CPU version of the code, and the hybrid version respectively.

Running

  • A couple of simple test input files have been provided within the Run_test directory. The procedure to run is identical to the standard CPU AFID code (https://github.com/PhysicsofFluids/AFiD). Simply rename or create a link to one of the test input files with the name 'bou.in' and run the desired executable from the directory containing this input file/link. For the GPU and CPU executables, the input files and variables within are identical to those used in the standard CPU AFID code.

  • To minimize repeating code for the various versions, many of the main solver subroutine files have been split into two files, a file that contains overloaded interfaces (one for CPU data, one for GPU data) to the subroutine and another included "common" file which contains the actual source. For example, CalcLocalDivergence.F90 contains only the subroutine interfaces, while CalcLocalDivergence_common.F90 contains the main routine source code, with necessary preprocessor directives to control the compilation. The generic interfaces themselves, for all the overloaded subroutines following this pattern can be found within SolverInterfaces.F90

Hybrid version (under development)

  • For the hybrid code, an additional input variable, GC_SPLIT_RATIO is required. This variable must be set so that 0.0 < GC_SPLIT_RATIO < 1.0. This variable controls the distribution of the domain between the CPU and GPU. For example, setting GC_SPLIT_RATIO to 0.8 will associate 80% of the domain to the GPU and the remaining 20% to the CPU. In all cases, the solution of the Poisson system remains fully on the GPU. See the provided input files to see where to define this new input variable.

  • When starting a simulation from scratch using the built-in initial condition, the hybrid implementation can show some stability issues, and may require a smaller timestep than the GPU or CPU code, at least initially. This is potentially due to the hybrid decomposition splitting across the extruded axis where the initial flow field is constant. Rotating the initial flow field seems to help the issue in this case.

Cartesius (SURFsara) compilation and run instructions (last check April 17, 2017)

To compile the code on Cartesius at SARA, first login on node gcn1 (the code requires some CUDA libraries that seems to be available only there), then:

  1. Load the required modules:

    module load mpi/mvapich2-gdr/2.1-cuda70 hdf5/mpich/pgi/1.8.16 mkl cuda/7.0.28 fftw3/intel/3.3.3-avx pgi/15.7

  2. Compile with:

MPICH_FC=pgf90 make -f Makefile

The Makefile will build three executables, afid_gpu, afid__cpu, and afid_hyb (under development)

To submit a job on Cartesius, you can use a script similar to this one. The code is expecting a GPU for each MPI task and since Cartesius has two GPUs per node, if you want to run with 48 GPUs (-n 48) you will request half the nodes (-N 24). The nvidia-smi line will boost the clocks of the GPU. The example is using the gpu_short queue, for production runs you may want to use the gpu queue.

################################

#!/bin/bash

#SBATCH -N 16

#SBATCH --tasks-per-node 2

#SBATCH -t 05:00

#SBATCH -p gpu_short

#SBATCH -o job_32GPU-%j.outerr

#SBATCH -e job_32GPU-%j.outerr

module load pgi/16.7 cuda

nvidia-smi -ac 3004,875

srun --mpi=pmi2 -n 32 ./afid_gpu

################################

Piz Daint (CSCS) compilation and run instructions (last check April 17, 2017)

  1. Load the required modules:

    module load pgi cudatoolkit fftw cray-hdf5-parallel/1.8.16 intel module load pgi cudatoolkit fftw cray-hdf5-parallel/1.8.16 intel

  2. Compile with:

MPICH_FC=pgf90 make -f Makefile

To submit a job on Piz Daint, you can use the sample script below.

################################

#!/bin/bash -l

#SBATCH --nodes=12

#SBATCH --constraint=gpu

#SBATCH --ntasks-per-node=1

#SBATCH --gres=gpu:1

#SBATCH --cpus-per-task=12

#SBATCH -t 1-00:00:00

#SBATCH -p normal

#SBATCH -J test

echo "On which nodes it executes"

echo $SLURM_JOB_NODELIST

export OMP_NUM_THREADS=1

export PMI_NO_FORK=1

export OMP_SCHEDULE=static

echo "Now run the MPI tasks..."

srun -n 12 ./afid_gpu

################################

afid_gpu_opensource's People

Contributors

maxcuda avatar romerojosh avatar stevensrjam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

afid_gpu_opensource's Issues

issue in "decomp_info_init" subroutine of "decomp_2d.F90" file

Hi,
It seems that the decomp_info_init subroutine in decomp_2d.F90 file has some problems, particularly at the block #ifdef USE_CUDA starting at line 716.
The calls to c_f_pointer leads to a compilation error in pgi-18.10 with the error:

PGF90-S-0550-Allocatable device array is not in c_f_pointer with -Mallocatable=03 - work1_r_d. (decomp_2d.F90: 733)
PGF90-S-0550-Allocatable device array is not in c_f_pointer with -Mallocatable=03 - work2_r_d. (decomp_2d.F90: 735)

By using pgi-16.5, however, the code is compiled successfully, but it fails to run when it reaches line 733 - the first call to c_f_pointer.

Moreover, the definitions of cptr1 and cptr2 are not consistent.

Would you please take a look at these?

Many thanks.
Ali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.