amrex-astro / mini-castro Goto Github PK
View Code? Open in Web Editor NEWa mini-app version of castro
License: MIT License
a mini-app version of castro
License: MIT License
When using inputs.128 with amr.max_level = 1 on summitdev, bad data is generated and the timesteps get really small. This does not occur with inputs.64, and does not occur in debug mode.
When attempting to compile eos_type.F90
with PGI 17.9 using OpenACC, StarLord encounters this error:
PGF90-W-0435-Array declared with zero size (..//Source/eos_type.F90: 132)
0 inform, 1 warnings, 0 severes, 0 fatal for eos_type_module
PGF90-S-0155-Module variables used in acc routine need to be in !$acc declare create() - zion$p (..//Source/eos_type.F90: 190)
PGF90-S-0155-Module variables used in acc routine need to be in !$acc declare create() - zion$sd (..//Source/eos_type.F90: 190)
composition:
173, Generating acc routine seq
Generating Tesla code
0 inform, 0 warnings, 2 severes, 0 fatal for composition
PGF90-S-0155-Module variables used in acc routine need to be in !$acc declare create() - small_x$p (..//Source/eos_type.F90: 214)
PGF90-S-0155-Accelerator region ignored; see -Minfo messages (..//Source/eos_type.F90)
normalize_abundances:
0, Accelerator region ignored
214, Accelerator restriction: invalid loop
0 inform, 0 warnings, 2 severes, 0 fatal for normalize_abundances
clean_state:
224, Generating acc routine seq
Generating Tesla code
eos_get_small_temp:
255, Generating acc routine seq
Generating Tesla code
eos_get_small_dens:
268, Generating acc routine seq
Generating Tesla code
eos_get_max_temp:
282, Generating acc routine seq
Generating Tesla code
eos_get_max_dens:
296, Generating acc routine seq
Generating Tesla code
make: *** [tmp_build_dir/o/3d.pgi.MPI.EXE/eos_type.o] Error 2
This occurs on commit eaa153c ("Add a .gitignore for Util/").
This issue describes a recommended approach for launching Fortran functions as CUDA kernels. It uses as an example the Castro function
ca_compute_temp(const int* lo, const int* hi, const Real* state, const int* state_lo, const int* state_hi)
In order to launch this function on the device, a CUDA kernel needs to be launched first, and then this function needs to be called inside the kernel as a device function. This should be done by wrapping this function in DEVICE_LAUNCHABLE(), as:
DEVICE_LAUNCHABLE(ca_compute_temp(const int* lo, const int* hi, const Real* state, const int* state_lo, const int* state_hi));
(When we're not compiling for the device, this will be a simple C++ preprocessor function macro that does nothing.)
This should be expanded to:
__device__ void ca_compute_temp
(const int* lo, const int* hi, const Real* state, const int* state_lo, const int* state_hi);
__global__ void cuda_ca_compute_temp
(const int* lo, const int* hi, const Real* state, const int* state_lo, const int* state_hi);
That is, it should prepend __device__ to the target Fortran function (which must have attributes(device) manually prepended to it). It should also create another function declaration prepended with cuda_, that has the same arguments.
The new cuda_ function should look like:
__global__ void cuda_ca_compute_temp
(const int* lo, const int* hi, const amrex::Real* state, const int* state_lo, const int* state_hi)
{
int blo[3];
int bhi[3];
get_loop_bounds(blo, bhi, lo, hi);
ca_compute_temp(blo, bhi, state, state_lo, state_hi);
}
and should be declared in a separate compilation unit, not the header file (a reasonable choice would be a single .cpp file that contains all of the newly created CUDA functions).
Note that get_loop_bounds is a function that is found in AMReX_Device.H.
The corresponding call to this function should be:
DEVICE_LAUNCH(ca_compute_temp(lo, hi, state, state.loVect(), state.hiVect()));
This should be replaced by:
dim3 numThreads, numBlocks;
amrex::Device::c_threads_and_blocks(lo, hi, numBlocks, numThreads);
cuda_ca_compute_temp<<<numBlocks, numThreads, 0, amrex::Device::cudaStream()>>>(lo, hi, state, state.loVect(), state.hiVect());
Note that this makes it a requirement that lo and hi are the first two arguments to the function. This way they can be replaced by the zone index corresponding to each CUDA thread.
When using a certain combination of boxes, e.g. inputs.128 with amr.max_grid_size=32, the diagnostics report NaN's. This occurs because the calls to derive that are being used to fill in the fields for the diagnostics are generating bad data with NaN's in them, which implies some problem in FillPatch.
We should compile StarLord in single precision and compare results to double precision (looking at both performance and relative accuracy). Main difficulties are:
Currently the default CUDA version that StarLord anticipates is version 8. However, MPI + CUDA codes on summitdev crash at run time if using nvprof
with CUDA 8. This crash is avoided if one compiles with version 8 but runs with version 9 (the latest available on summitdev). Is there a reason not to change the default compilation version to 9 to fix this problem without requiring the above workaround?
The "thread-safe" version of filcc that was written to accommodate CUDA support in StarLord may not fill corner zones with valid values. Need to check this and fix it if so.
Hi,
on commit eaa153c ("Add a .gitignore for Util/") I encounter the following error trying to compile eos_type.f90 with GCC 6.3.1 using OpenACC:
..//Source/eos_type.F90:177:38:
..//Source/eos_type.F90:175:21:
!$acc routine seq
2
..//Source/eos_type.F90:177:38:
use bl_constants_module, only: ONE
1
Error: USE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:178:43:
..//Source/eos_type.F90:175:21:
!$acc routine seq
2
..//Source/eos_type.F90:178:43:
use network, only: aion, aion_inv, zion
1
Error: USE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:180:17:
..//Source/eos_type.F90:175:21:
!$acc routine seq
2
..//Source/eos_type.F90:180:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:207:38:
..//Source/eos_type.F90:205:21:
!$acc routine seq
2
..//Source/eos_type.F90:207:38:
use bl_constants_module, only: ONE
1
Error: USE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:208:43:
..//Source/eos_type.F90:205:21:
!$acc routine seq
2
..//Source/eos_type.F90:208:43:
use extern_probin_module, only: small_x
1
Error: USE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:210:17:
..//Source/eos_type.F90:205:21:
!$acc routine seq
2
..//Source/eos_type.F90:210:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:228:17:
..//Source/eos_type.F90:226:21:
!$acc routine seq
2
..//Source/eos_type.F90:228:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:259:17:
..//Source/eos_type.F90:257:21:
!$acc routine seq
2
..//Source/eos_type.F90:259:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:272:17:
..//Source/eos_type.F90:270:21:
!$acc routine seq
2
..//Source/eos_type.F90:272:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:286:17:
..//Source/eos_type.F90:284:21:
!$acc routine seq
2
..//Source/eos_type.F90:286:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
..//Source/eos_type.F90:300:17:
..//Source/eos_type.F90:298:21:
!$acc routine seq
2
..//Source/eos_type.F90:300:17:
implicit none
1
Error: IMPLICIT NONE statement at (1) cannot follow !$ACC ROUTINE statement at (2)
make: *** [tmp_build_dir/o/3d.gnu.MPI.EXE/eos_type.o] Error 1```
At present sxm, sxp, etc. have size NQ, since we compute all of the edge-state reconstructions first, and then in a separate step "trace" under them (which for MOL is a no-op). In Castro this was simplified (4f84d83b283a35551a1ea523f582294ce1e4f02a) to use only a single copy of the edge state array, and then loop from 1 to NQ, in each case doing the PPM reconstruction and then immediately doing the "tracing." If we can do this in StarLord, we can cut down on the memory footprint by quite a bit, since there are six of these arrays and NQ is ~20 when we use aprox13.
This cannot be done trivially because we are currently launching separate kernels for the reconstruction and tracing steps. But perhaps there is some related approach we can do here.
we need nightly regression testing so we are confident that the CUDAing is working
it might be best to rename this to reflect it's heritage to Castro
ptxas application ptx input, line 9; fatal : Unsupported .version 6.0; current version is '5.0'
ptxas fatal : Ptx assembly aborted due to errors
nvlink fatal : elfLink fatbinary error
pgacclnk: child process exit status 2: /autofs/nccs-svm1_sw/summitdev/.swci/0-core/opt/spack/20171006/linux-rhel7-ppc64le/gcc-4.8.5/pgi-17.10-y76zzohzqn6lgwv6ktlihmfwj5ebdp2x/linuxpower/17.10/bin/pgnvd
make: *** [Castro3d.pgi.MPI.CUDA.ex] Error 2
$ module list
Currently Loaded Modules:
1) hsi/5.0.2.p5 2) xalt/0.7.5 3) lsf-tools/1.0 4) DefApps 5) pgi/17.10 6) spectrum-mpi/10.2.0.0-20180110 7) git/2.13.0 8) cuda/9.0.69 9) emacs/25.1 10) python/3.5.2
Seems to work with CUDA 8.
It compiles, but completely fails at runtime. Need to investigate why.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.