Git Product home page Git Product logo

op2-common's People

Contributors

abetts155 avatar aowenson avatar basegpu avatar bgd54 avatar bozbez avatar dylanrubini avatar endrelaszlo avatar francisrussell avatar gihanmudalige avatar gmarkall avatar jdjfisher avatar kynan avatar m-8k avatar mgsium avatar mike-giles avatar onidaito avatar pbartholomew08 avatar reguly avatar robertsawko avatar tobyflynn avatar wence- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

op2-common's Issues

OP_MAX in C

Dear OP2 developers,
I am testing the option of OP_MAX in a op_par_loop (C framework), e.g.
op_arg_gbl(&dzMax, 1, double, OP_MAX )

the python translator seems to have a typo (op2_gen_openmp_simple.py on line 483)
elif accs(m)==OP_MAX:
instead of
elif accs[m]==OP_MAX:

with this change translation and compilation works, but the data array dzMax is not affected by OP_MAX. What is the status of the OP_MAX feature?

[Feature request] op_map support for long int indexes

I recently experienced an issue reading meshes in CGNS format with the latest version of the CGNS library, which changed the indexing variables for the connectivity from int to cgsize_t (aka long int).

At the moment this is incompatible with op2, which complains if I call op_decl_map passing a long int* instead of an int*. I had to manually workaround this by allocating another array of ints and copying the connectivity data to it.

I guess this change was made in CGNS because big meshes could potentially overflow int, so support for long int in op2 could be a desirable feature.

Infinite loop in op_plan_core() triggered with a particular multigrid mesh

I have a Rotor37 mesh of approximately 8 million nodes, multigrid with 4 levels. If I feed this into the hybrid MPI+OpenMP variant of MG-CFD-OP2 with at least 2 MPI processes, then OP2 appears to become stuck in an infinite loop during a call to op_plan_core() for the highest multigrid mesh.

Interestingly, this issue does not occur with just 1 MPI process, or for the pure OpenMP variant of MG-CFD-OP2. The choice of partitioner has no influence.

To help reproduce the issue I can provide a .json file for feeding into the MG-CFD job generater, this generates a script that compiles and executes MG-CFD. I will have to transfer the mesh offline.

The OP2 branch 'fix/op-plan-core-infinite-loop' contains a check for this infinite loop.

Build Issues

I've encountered a few of issues when building OP2-Common under a x86_64 Ubuntu 14.04 installation:

Running cmake.local in op2/c, building fails (presumably with gcc, which is the system default):

[ 75%] Building C object src/sequential/CMakeFiles/op2_seq.dir/__/core/op_dummy_singlenode.c.o
/tmp/OP2-Common/op2/c/src/core/op_dummy_singlenode.c:169:9: error: struct has no members [-Werror=pedantic]
typedef struct {
         ^
/tmp/OP2-Common/op2/c/src/core/op_dummy_singlenode.c:174:9: error: struct has no members [-Werror=pedantic]
 typedef struct {
         ^

apparently due to pedantic warnings being treated as errors.

If I edit cmake.local and modify Developer to Release, I can do an in-source build of the OP2 libraries.

Within apps/c, according to the README, I should then be able to perform a build which assumes that the libraries are located in the OP2 library directory, but running cmake.local appears unable to detect any of the OP2 libraries and therefore skips all applications.

Going into one of the application directories, I've been able to do a manual build of some of the apps, though even then, the Makefile uses options like "-arch x86_64" to g++ which appears to be an option only supported by g++ under Darwin.

Missing kernel invocations in MPI setting

Currently, the code that the Python translator produces only calls the kernel if the MPI-local size of the set is greater than zero:

However, in the MPI setting, the kernel needs to be called even if only the import execute halo is nonempty.

As far as I can see the if statement should therefore actually check whether set_size > 0. If this is correct then it might be possible to drop the if statement entirely.

[feature request] subdirectory for kernel files

in a project using OP2, all source files that are translated are located in a directory src.

In the same directory I put all header files with inline void kernelXXX(){...}, and accordingly the generated implementation files of the kernels for the different backends are there as well.

Now, would it be possible to let the translator know a subdirectory e.g. src/kernels where all the kernel header files are located? The generated kernels should be put there as well and the includes in the translated source should be updated as well.

For large projects with many source files and many more kernel files this functionality would increase the flexibility from an organisational point of view.

kernel with indirect RW and vectorised argument

python translator, C-side, cuda simple

passing an indirect op_arg_dat with OP_RW access to a kernel (ind_rw = 1), sets the 2 level colouring flag op_color2 to 1. If passing in the same kernel a vectorised argument, then compilation fails, because of

#
# lengthy code for general case with indirection
#
    if ninds>0 and not op_color2:
       # obviously not the case
       # ... but here the vectorised arguments are handled
#
# simple version for global coloring
#
    elif ninds>0:
      # falling back to this case
      # ...no vectorised arguments handled here

kernels with either:

  • indirect RW and no vectorised args, or
  • direct RW and vectorised args

work fine.

Is this a bug or an intended limitation of the framework?

OP2 specification for inline "device" kernels

The OP2 specification should formally define how inlined device kernels are handled

  • supposedly they work out-of-the-box in the reference implementation
  • the source-to-source translation needs to translate device kernels in the correct backend-specific representation
    • for CUDA they could be used for both host and device by prefixing them with both the __host__ and __device__ keywords

Runtime error after partial halo exchanges merge for cmake built libs

There appear to be a runtime error in airfoil_mpi (both hdf5 and plain) when linked against libraries built with the cmake build system. However the code works without this error when build with the regular makefiles build.

I have only yet tested this on Zen and am looking in to fixing this under the branch fix/cmake_error. If any one has any ideas as to why this happening and the best way to track this error down, do let me know.

Compiling libop2_mpi

I have an issue compiling the libop2_mpi library. I have the following output for make mpi_seq:

/home/abhi/openmpi/bin/mpiCC -DCOMM_PERF -DDEBUG -Wall -O3 -g -Iinclude -c src/mpi/op_mpi_core.c \
        -o obj/mpi/op_mpi_core.o
In file included from src/mpi/op_mpi_core.c:48:
include/op_mpi_core.h:42:1: warning: "MPI_ROOT" redefined
In file included from src/mpi/op_mpi_core.c:46:
/home/abhi/openmpi/include/mpi.h:289:1: warning: this is the location of the previous definition
src/mpi/op_mpi_core.c: In function â:
src/mpi/op_mpi_core.c:545: warning: unused variable â
src/mpi/op_mpi_core.c: In function â:
src/mpi/op_mpi_core.c:2021: warning: unused variable â
src/mpi/op_mpi_core.c:2021: warning: unused variable â
src/mpi/op_mpi_core.c: In function â:
src/mpi/op_mpi_core.c:2473: warning: comparison between signed and unsigned integer expressions
src/mpi/op_mpi_core.c:2536: warning: comparison between signed and unsigned integer expressions
src/mpi/op_mpi_core.c:2598: warning: comparison between signed and unsigned integer expressions
src/mpi/op_mpi_core.c:2432: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c:2495: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c:2558: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c: In function â:
src/mpi/op_mpi_core.c:2220: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c:2282: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c:2345: warning: â may be used uninitialized in this function
src/mpi/op_mpi_core.c: In function â:
src/mpi/op_mpi_core.c:726: warning: â may be used uninitialized in this function
g++ -DCOMM_PERF -DDEBUG -Wall -O3 -g -Iinclude -c src/core/op_rt_support.c \
        -o obj/op_rt_support.o
/home/abhi/openmpi/bin/mpiCC -DCOMM_PERF -DDEBUG -Wall -O3 -g -Iinclude -c src/mpi/op_mpi_decl.c \
        -o obj/mpi/op_mpi_decl.o
In file included from src/mpi/op_mpi_decl.c:11:
include/op_mpi_core.h:42:1: warning: "MPI_ROOT" redefined
In file included from src/mpi/op_mpi_decl.c:7:
/home/abhi/openmpi/include/mpi.h:289:1: warning: this is the location of the previous definition
/home/abhi/openmpi/bin/mpiCC -DCOMM_PERF -DDEBUG -Wall -O3 -g -Iinclude -I/home/abhi/parmetisv311/include -DPARMETIS -I/home/abhi/ptscotch/include -DPTSCOTCH \
        -L/home/abhi/ptscotch/lib -lptscotchparmetis -L/home/abhi/parmetisv311 -lparmetis -L/home/abhi/parmetisv311 -lmetis -L/home/abhi/ptscotch/lib/ -lptscotch -L/home/abhi/ptscotch/lib/ -lptscotcherr \
        -c src/mpi/op_mpi_part_core.c \
        -o obj/mpi/op_mpi_part_core.o obj/core/op_util.o 
In file included from src/mpi/op_mpi_part_core.c:64:
include/op_mpi_core.h:42:1: warning: "MPI_ROOT" redefined
In file included from src/mpi/op_mpi_part_core.c:50:
/home/abhi/openmpi/include/mpi.h:289:1: warning: this is the location of the previous definition
src/mpi/op_mpi_part_core.c:563:65: warning: backslash and newline separated by space
/home/abhi/ptscotch/include/ptscotch.h:179: error: expected â or â before â
/home/abhi/ptscotch/include/ptscotch.h:234: error: expected â or â before â
make: *** [mpi_seq] Error 1

[Feature request] type declaration in dat declaration from h5

When calling

op_decl_dat_hdf5(op_set set, int dim, char const *type, char const *file, char const *name);

the passed argument type must match the datatype of the according dataset defined in the h5 container (see op2/c/src/externlib/op_hdf5.c, line 274 ff)

const char *typ = dset_props.type_str;
if (!op_type_equivalence(typ, type)) {
  op_printf("dat.type %s in file %s and type %s do not match\n", typ, file, type);
  exit(2);
}

Therefore the declaration call fails in case that the two types do not match.

The situation that I face is:
Given a fully defined model in a h5 container, I'd like to run the simulation with single or double precision (the according simulation binaries were successfully compiled). Calling the double precision binary works perfectly since the datasets in the h5 container are stored as doubles. Calling the single precision binary fails, of course, because of the above mentioned restriction of type equivalence.

Is there a possibility to relax this restriction (only between single and double) and provide some automatic type conversion in the op_decl_dat_hdf5 call?

op_cuda_reduction.h::op_reduction() only supports one type per application

Suppose an application has two OP2-generated CUDA kernels. One performs a reduction of a double, the other performs a reduction of an integer:

__global__ void op_cuda_kernel1(const double * arg0, double *arg1, int set_size) { double arg1_l[1]; ... for (int d=0; d<1; d++) { op_reduction<OP_INC>(..., arg1_l[d]); } }

__global__ void op_cuda_kernel2(const double * arg0, int *arg1, int set_size) { int arg1_l[1]; ... for (int d=0; d<1; d++) { op_reduction<OP_INC>(..., arg1_l[d]); } }

After compiling one of these, attempting to compile the second kernel fails with a type error:

.../OP2-Common/op2/c/include/op_cuda_reduction.h(51): error: declaration is incompatible with previous "temp" (51): here detected during instantiation of "void op_reduction<reduction,T>(volatile T *, T) [with reduction=3, T=int]"

To reproduce for yourself, pull the MG-CFD-app-OP2 repository, uncomment the op_reduction's in cuda/count_bad_vals_kernel.cu and cuda/calc_rms_kernel_kernel.cu, the compile 'mgcfd_cuda'

removing #includes in cuda files

after having removed the includes of the kernel header files *.h from the generated cuda code (which actually works perfectly), why are the includes added again in case of having more than ONE include?

line 321 ff, op_gen_cuda_simple.py

if CPP:
  includes = op2_gen_common.extract_includes(kernel_text)
  if len(includes) > 1:
    for include in includes:
      code(include)
    code("")

This leads to compile errors of the cuda kernels in case you have more than one includes...?!

"op_decl_dat_char" reports error for sets of MPI-local size zero

The function xmalloc currently returns a NULL pointer if zero bytes are requested. Therefore, op_decl_dat_char reports an error in the special case where the MPI-local size of a set is zero:

printf(" op_decl_dat_char error -- error allocating memory to dat\n");

As far as I can see this error message is only desirable if xmalloc returned NULL because it failed to allocate memory of nonzero size. Let me know if you would like me to submit a pull request.

Getting HDF5 dataset type and dimensions in C API

Currently in op_mpi_hdf5.c, the datatype and dimensions of a dataset are determined by reading the custom attributes in the dataset. Is there a reason why this approach was taken, which assumes that the dataset has the required attributes, instead of using specific HDF5 methods that perform these two tasks (H5Sget_simple_extent_dims() and H5Dget_type()) ?

I am preparing a commit that replaces the parsing of attributes with use of these HDF5 methods, and I am just checking that this will not create backwards compatibility issues.

Fortran data types in HDF5 routines

Fortran data types may not be correctly handled within the HDF5 read/write routines. The had5 routines are written in C and the Fortran API can use Fortran data types such as real(8). These type strings need to be identified as valid strings when writing to and reading from HDF5 files.

[Feature Request] enhanced op_fetch_data_hdf5_file

The current API of the op2_hdf5 library c/src/externlib/op_hdf5.c contains the function

void op_fetch_data_hdf5_file(op_dat dat, char const *file_name)

The name of the dataset in the h5-file is taken from dat->name. The following enhancements are desired:

  • op_fetch_data_hdf5_file is not capable of dealing with dat->name representing a path inside the h5-file (string with '/'s). Since the path functionality is there when declaring an op_dat with op_decl_dat_hdf5, it would be nice to have the same behaviour when fetching the data to an h5-container.
  • An additional function void op_fetch_data_hdf5_file(op_dat dat, char const *file_name, char const *dataset_name) would increase the flexibility of dumping op_dat to the h5-container, where dataset_name is a freely chosen path inside the h5-file of the dataset.

Base code works, but generated code segfaults

Hi everyone, I recently started learning OP2 for my internship in RR, and I am having some trouble having it work properly.

I have a simple code that I wrote and wrapped [2D heat equation, finite differences] that works fine in sequential mode with op2 wrappers, but the openmp version generated by op2.py segfaults when calling op_plan_core in one of the kernels [not the first that is executed].
As far as I understood from the documentation, if the base code [sequential + wrappers] works fine, the generated code is supposed to run out-of-the-box [please correct me if wrong], so I thought I filed this issue report.

I can provide more details, just tell me what you need.

cuda function attributes

why is the __host__ attribute put to the user defined functions? isn't it assumed that they all run on the device?

#define OP_FUN_PREFIX __host__ __device__
#include "../user_types.h"

we have problems with cuda specific math functions that are not available on the host...

Fortrain airfoil reference fails to build

../fortran/lib/libop2_for_reference.a(op2_for_reference_declarations.o): In function `__op2_fortran_declarations_MOD_op_timers':
op2_for_reference_declarations.F90:(.text+0x9): undefined reference to `op_timers'

op_timers is defined in op2/c/src/cuda/op_cuda_rt_support.c, which is linked into

  • libop2_for_rt_support.a
  • libop2_for_openmp_rt_support.a

The Fortran library doesn't build on cpp-matrices

Since some of the header files include petsc headers on the cpp-matrices branch, the Fortran OP2 lib build fails because it doesn't set compiler options to include the correct locations of these include files.

The build system for the Fortran library on cpp-matrices should be fixed so that PETSc includes and libraries are located and included.

Performance degradation on Aero and Airfoil if I pass -DCMAKE_BUILD_TYPE=Release

Hi folks,

If I pass -DCMAKE_BUILD_TYPE=Release to the cmake.local script corresponding to the apps directory in the repo, for some reason I am getting performance degradation (and that shows up on two different machines I've run these benchmarks on).

More specifically, if I run airfoil_dp_seq and aero_dp_seq (with the input files new_grid.dat and FE_grid.dat that are generated during building, respectively), I am getting the following numbers:

Benchmark Machine 1 CPU time (s) Machine 2 CPU time (s)
Aero with no explicit build type 67.979 43.322
Aero with Release build type 108.219 84.534
Airfoil with no explicit build type 865.889 1170.140
Airfoil with Release build type 1007.317 1317.661

Any idea on what could be causing this?

Machine 1 has:

  • An Intel Core i5-3230M CPU (3.20 GHz, L3 size: 3 MB)
  • 8 GB of DDR3 RAM at 1333 MHz

Machine 2 has:

  • An Intel Core i7-6700 CPU (3.40 GHz, L3 size: 8 MB)
  • 16 GB of DDR4 RAM at 2133 MHz

EDIT: I'm using gcc 5.4.0 on Ubuntu 16.04

C Apps don't build on master branch

To replicate:

cd op2/c
./cmake.local
cd ../../apps/c
./cmake.local

The error produced is:

Scanning dependencies of target AIRFOIL_h5_grid
make[2]: *** No rule to make target `new_grid.dat', needed by `new_grid_out.h5'. Stop.
make[1]: *** [airfoil_hdf5/CMakeFiles/AIRFOIL_h5_grid.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

OpenMP not runnning

Hi,

OpenMP (the airfoil code) is running only in sequential mode, even after I exported OMP_NUM_THREADS environment variable. Can anybody check it ?

Regards,
Abhishek

op_opt_arg_dats only supported with the Fortran API

The actual need for having op_opt_arg_dats was Rolls Royce Hydra. We have not had any C/C++ API based application using op_opt_arg_dats yet. As such they are yet to be implemented on the C side code generators and back-end libs.

./cmake.local broken

When I run ./cmake.local, I get the following:

./cmake.local 
-- Setting GNU C compiler options
-- Setting GNU CXX compiler options
-- Configure with the following features enabled:
 *    Sequential library , For reference testing
 *    OpenMP library , Requires OpenMP support by the compiler
 *    HDF5 library , Requires the HDF5 library
 *    MPI library , Requires an MPI library
 *    Parallel graph partitioning with PT-Scotch , Requires the PT-Scotch library
 *    Parallel graph partitioning with ParMETIS , Requires the ParMETIS library
 *    CUDA library , Requires the NVIDIA CUDA toolkit

-- Configure with the following features disabled:
 *    Fortran interface , Enables the OP2 Fortran interface to the OP2 C libraries

-- Checking for optional packages...
-- OpenMP found
-- CUDA found
-- MPI found
-- Checking for package 'SCOTCH-PT'
-- Could NOT find SCOTCH (missing:  PTSCOTCH_LIBRARY PTSCOTCHERR_LIBRARY SCOTCH_TEST_RUNS) (found version "6.0.0")
-- Optional package PTScotch not found - disabling
-- Could NOT find ParMETIS (missing:  PARMETIS_TEST_RUNS) (found version "4.0")
-- Optional package ParMETIS not found - disabling
-- HDF5 found
-- Configuring targets...
-- Configuring OP2 sequential library
-- Configuring OP2 OpenMP library
-- Configuring OP2 HDF5 library
-- Configuring OP2 MPI library
-- Configuring OP2 CUDA library
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/build
[ 11%] [ 23%] Built target op2_openmp
Built target op2_seq
[ 32%] Built target op2_hdf5
[ 35%] Building C object src/mpi/CMakeFiles/op2_mpi.dir/op_mpi_hdf5.c.o
[ 50%] Built target op2_cuda
[ 52%] Building C object src/mpi/CMakeFiles/op2_mpi_cuda.dir/op_mpi_hdf5.c.o
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c: In function ‘op_decl_set_hdf5’:
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:88:3: error: implicit declaration of function ‘H5Pset_fapl_mpio’ [-Werror=implicit-function-declaration]
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:98:3: error: implicit declaration of function ‘H5Pset_dxpl_mpio’ [-Werror=implicit-function-declaration]
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c: In function ‘op_decl_set_hdf5’:
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:88:3: error: implicit declaration of function ‘H5Pset_fapl_mpio’ [-Werror=implicit-function-declaration]
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:98:3: error: implicit declaration of function ‘H5Pset_dxpl_mpio’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[2]: *** [src/mpi/CMakeFiles/op2_mpi.dir/op_mpi_hdf5.c.o] Error 1
make[1]: *** [src/mpi/CMakeFiles/op2_mpi.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
cc1: all warnings being treated as errors
make[2]: *** [src/mpi/CMakeFiles/op2_mpi_cuda.dir/op_mpi_hdf5.c.o] Error 1
make[1]: *** [src/mpi/CMakeFiles/op2_mpi_cuda.dir/all] Error 2
make: *** [all] Error 2
[ 11%] Built target op2_seq
[ 23%] Built target op2_openmp
[ 32%] Built target op2_hdf5
[ 35%] Building C object src/mpi/CMakeFiles/op2_mpi.dir/op_mpi_hdf5.c.o
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c: In function ‘op_decl_set_hdf5’:
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:88:3: error: implicit declaration of function ‘H5Pset_fapl_mpio’ [-Werror=implicit-function-declaration]
/home/ehynes/Desktop/Dropbox/UCD/frederic/OP2-Common/op2/c/src/mpi/op_mpi_hdf5.c:98:3: error: implicit declaration of function ‘H5Pset_dxpl_mpio’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[2]: *** [src/mpi/CMakeFiles/op2_mpi.dir/op_mpi_hdf5.c.o] Error 1
make[1]: *** [src/mpi/CMakeFiles/op2_mpi.dir/all] Error 2
make: *** [all] Error 2

I'm totally stuck now ;(

Please help!

Installation description

There should be comments on the supported versions of different libraries (e.g. PT-Scotch 6 isn't supported and HDF5 needs a parallel build).

Multi-GPU on Zen (and possibly other systems)

Some issues identified in the multi-GPU execution of Airfoil (C++ version)

  1. On Zen the Airfoil MPI+CUDA execution does not pick up the correct GPU cards even though specified explicitly using CUDA_VISIBLE_DEVICES.
  2. When there is more MPI processes than GPUs on the system, the full Hybrid CPU+GPU execution starts up but later fails in save_soln_kernel.cu
save_soln_kernel.cu(120) : cutilSafeCall() Runtime API error : unspecified launch failure.

branch fix/vectorised

The branch fix/vectorised is quite a bit ahead of the master and resolves a bug in passing arguments in vectorised form.

Since October 2017 this branch is not active and I was wondering whether this branch could be merged into master?

Accesses of unset values during global reductions

We discovered that global reductions do not seem to work properly if the problem sizes do not fit nicely to the size of the block dimension. The function op_reduction in op_cuda_reduction.h assumes that it is run by all threads in the block, which is not the case when called from kernels that do not need all of them because the problem is too small. In these cases it accesses unset values from the shared temp array.

Aero HDF5 fails on cx1

aero_hdf5_dp_cuda:

dat.type double:soa:½ in file FE_grid.h5 and type double:soa do not match

aero_hdf5_dp_mpi:

dat.type double:soa<87><86> in file FE_grid.h5 and type double:soa do not match
dat.type double:soaGj^B in file FE_grid.h5 and type double:soa do not match
dat.type double:soa^Z<9c>^A in file FE_grid.h5 and type double:soa do not match
dat.type double:soaÂ+^C in file FE_grid.h5 and type double:soa do not match

aero_hdf5_dp_mpi_cuda:

dat.type double:soaZQ^B in file FE_grid.h5 and type double:soa do not match
dat.type double:soa¾Ê in file FE_grid.h5 and type double:soa do not match
dat.type double:soa/!^C in file FE_grid.h5 and type double:soa do not match
dat.type double:soa^T^A in file FE_grid.h5 and type double:soa do not match

aero_hdf5_dp_openmp:

dat.type double:soa^\^P^P in file FE_grid.h5 and type double:soa do not match

aero_hdf5_dp_seq:

dat.type double:soa$^A^P in file FE_grid.h5 and type double:soa do not match

Issue in compiling OP2 fortran OpenMP

**Dear OP2 team,

I am unable to get the OP2 Fortran library version to compile. This is the output.
Thank you for your help. Tried uncommenting the commented lines in 1312 & 1368.
Kind regards,
Dr Abhishek C
**

/home/abhi/OP2/op2/fortran/src/op2_for_declarations.F90:1367:54:

 op_arg_gbl_python_i4_2dim = op_arg_gbl_c ( c_loc (int_ptr(dat)), dim, C_CHAR_'int'//C_NULL_CHAR, 4, access-1 )
                                                  1

Error: Argument X at (1) to C_LOC shall have either the POINTER or the TARGET attribute
/home/abhi/OP2/op2/fortran/src/op2_for_declarations.F90:1311:54:

 op_arg_gbl_python_r8_2dim = op_arg_gbl_c ( c_loc (real_ptr(dat)), dim, C_CHAR_'double'//C_NULL_CHAR, 8, access-1 )
                                                  1

Error: Argument X at (1) to C_LOC shall have either the POINTER or the TARGET attribute
Makefile:321: recipe for target 'f_openmp' failed
make: *** [f_openmp] Error 1

QuickSort for arrays of length zero

The quickSort implementation currently assumes that the array to sort has nonzero length because it accesses an element in any case:

int pivot = arr[(left + right) / 2];

This function is used in is_onto_map:

quickSort(to_elem_copy, 0, map->from->size * map->dim - 1);

In case map->from->size == 0 this function call does not satisfy quickSort's precondition.

I would suggest to avoid the unconditional access in quickSort to support this special case. Let me know if you would like me to submit a PR.

Compilation fails for repeated kernel with different arguments

Hi,

The Compilation fails for repeated kernel with different arguments. Sequentail version works but not genseq or other translated versions. The error is repeated par_loop kernels.

op_par_loop(test, "test1", faces,
op_arg_dat( a, -1, OP_ID, 1, OP_READ),
op_arg_dat( c 0, some_mapping , 1, OP_READ),
op_arg_dat( c, 1 some_mapping , 1, OP_READ));

op_par_loop(test, "test1", faces,
op_arg_dat( d, -1, OP_ID, 1, OP_READ),
op_arg_dat( e, 0, some_mapping , 1, OP_READ),
op_arg_dat( e, 1 some_mapping , 1, OP_READ));

Is it easy to handle such in the translator?

airfoil_dp_mpi_cuda doesn't build

Hi,

I am trying to build OP2 apps to run some tests too, but I can't seem to build all airfloil app variant due to some esoteric cmake error, I think. If you can, please advise if there's anything obvious I am doing wrong or that I can fix.

The confusing bit is that despite the warning some variants get built and some don't. For instance airfoil_tempdats_mpi_cuda gets build but airfoil_dp_mpi_cuda does not.

The cmake line is based pretty much on what you left in the apps directory:

  eval cmake -DCMAKE_INSTALL_PREFIX=$INSTALLDIR \
             -DCMAKE_BUILD_TYPE=Developer \
             -DCMAKE_C_FLAGS_DEVELOPER="-O2 -g -Wall -Werror" \
             -DCMAKE_CXX_FLAGS_DEVELOPER="-O2 -g -Wall -Werror -Wno-long-long" \
             -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF \
             -DAIRFOIL_MESH_FILE=../new_grid.dat \
             $CMAKE_EXTRA_ARGS \
             .. &&
             (make -j$CORES all; make install)

I get plenty of errors of this type:

  add_dependencies.  Run "cmake --help-policy CMP0046" for policy details.
  Use the cmake_policy command to set the policy and suppress this warning.

  The dependency target "AIRFOIL_grid" of target "airfoil_tempdats_mpi" does
  not exist.
Call Stack (most recent call first):
  airfoil/airfoil_tempdats/CMakeLists.txt:56 (op2_application)

hdf5 target fails to build with HDF5 1.8.4

/usr/bin/mpiCC -g -fPIC -DUNIX -Wall -Iinclude -I/include  -c src/externlib/op_hdf5.c -L/lib -lhdf5 -lz  -o obj/externlib/op_hdf5.o 
src/externlib/op_hdf5.c: In function ‘op_set_core* op_decl_set_hdf5(const char*, const char*)’:
src/externlib/op_hdf5.c:64:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:59:17: warning: unused variable ‘dataspace’ [-Wunused-variable]
src/externlib/op_hdf5.c: In function ‘op_map_core* op_decl_map_hdf5(op_set, op_set, int, const char*, const char*)’:
src/externlib/op_hdf5.c:93:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:113:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:131:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:142:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c: In function ‘op_dat_core* op_decl_dat_hdf5(op_set, int, const char*, const char*, const char*)’:
src/externlib/op_hdf5.c:192:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:203:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:221:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:237:49: error: too many arguments to function ‘hid_t H5Dopen1(hid_t, const char*)’
/usr/include/H5Dpublic.h:145:14: note: declared here
src/externlib/op_hdf5.c:247:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src/externlib/op_hdf5.c:260:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src/externlib/op_hdf5.c:273:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

OpenMP performance issue

We recently encountered a strange performance pattern of the OpenMp backend, which we have never seen before: Running our performance benchmark (solving the shallow water equations, circular dam break) we observed very long times spent in ONE kernel. the bigger the mesh gets, and the more threads we use, the more punctuated the problem gets. What is puzzling me, is that we obtain it only on a few clusters, but not on local machines!

Have you ever seen/experienced similar, or does anybody have an idea what might be the reason of this behaviour. Any comments/ideas/feedbacks are much appreciated.

I listed some information about the problem below (examples from running on 4 threads).

the common picture:

  count   plan time     MPI time(std)        time(std)           GB/s      GB/s   kernel name 
 -------------------------------------------------------------------------------------------
      1;    0.0000;    0.0000(  0.0000);    0.0020(  0.0000);   20.3814;         ;   InitCellHydFeatures 
   1629;    0.0000;    0.0000(  0.0000);   16.6781(  0.0000);  15.6346;  16.2859;   TimeStepCalculation 
   1629;    0.0000;    0.0000(  0.0000);    0.1602(  0.0000);   3.0701;  13.0268;   HydWallBFluxes 
   1629;    0.0000;    0.0000(  0.0000);   51.4494(  0.0000);   6.7690;   7.0689;   HydInternalFluxes 
   1629;    0.0000;    0.0000(  0.0000);   25.4469(  0.0000);  16.1506;  17.1526;   Update 
Total plan time:   1.2157

the anomalous picture:

  count   plan time     MPI time(std)        time(std)           GB/s      GB/s   kernel name 
 -------------------------------------------------------------------------------------------
      1;    0.0000;    0.0000(  0.0000);    0.0046(  0.0000);    8.6442;         ;   InitCellHydFeatures 
   1629;    0.0000;    0.0000(  0.0000);   15.6782(  0.0000);  16.6317;  17.3246;   TimeStepCalculation 
   1629;    0.0000;    0.0000(  0.0000);    0.1812(  0.0000);   2.7140;  11.5160;   HydWallBFluxes 
   1629;    0.0000;    0.0000(  0.0000);   49.9202(  0.0000);   6.9764;   7.2854;   HydInternalFluxes 
   1629;    0.0000;    0.0000(  0.0000);  107.2164(  0.0000);   3.8332;   4.0710;   Update 
Total plan time:   1.3787

The misbehaving kernel Update:

op_par_loop(
        kernel_GLOBAL_Update,
        this->model->getKernel(KernelID::UPDATE)->getNameC(),
        op_sets[SetID::CELLSALL],
        op_arg_gbl    (       &dt,                                                                               1, real_str, OP_READ ),
        op_arg_gbl    (       &mFlag,                                                                            1, "int",    OP_READ ),
        op_arg_dat    (       op_dats[DataID::CELLSALL_GEOMAREA],     -1, OP_ID,                                 1, real_str, OP_READ ),
        op_arg_dat    (       op_dats[DataID::EDGESALL_GEOMLENG], -N_E_C, op_maps[MapID::CELLSALL_EDGESALL],     1, real_str, OP_READ ),
        op_arg_dat    (       op_dats[DataID::EDGESALL_FLUXESHY], -N_E_C, op_maps[MapID::CELLSALL_EDGESALL], N_F_H, real_str, OP_READ ),
        op_opt_arg_dat(mFlag, op_dats[DataID::EDGESALL_FLUXESMO], -N_E_C, op_maps[MapID::CELLSALL_EDGESALL], N_F_M, real_str, OP_READ ),
        op_arg_dat    (       op_dats[DataID::CELLSALL_EDGESIGN],     -1, OP_ID,                             N_E_C, "int",    OP_READ ),
        op_arg_dat    (       op_dats[DataID::CELLSALL_HYDSTATE],     -1, OP_ID,                             N_V_H, real_str, OP_RW   ),
        op_arg_dat    (       op_dats[DataID::CELLSALL_BOTTOMEL],     -1, OP_ID,                             N_V_M, real_str, OP_RW   ),
        op_arg_dat    (       op_dats[DataID::CELLSALL_ISWETTED],     -1, OP_ID,                                 1, "int",    OP_WRITE),
        op_arg_gbl    (       &Running,                                                                          1, "int",    OP_WRITE)
    );
kernel routine with indirection: kernel_GLOBAL_Update
 new execution plan #3 for kernel Update
 number of blocks       = 3884 
 number of block colors = 1 
 maximum block size     = 256 
 average thread colors  = 1.00 
 shared memory required =  33.94 KB
 average data reuse     = 1.09 
 data transfer (used)   = 240.61 MB 
 data transfer (total)  = 255.53 MB 
 SoA/AoS transfer ratio = 1.02 

op_decl_dat_hdf5 and friends can fail to read entire datatype name

op_hdf5.c does this (line 540 and onwards);

dataspace= H5Screate(H5S_SCALAR);
hid_t atype = H5Tcopy(H5T_C_S1);
H5Tset_size(atype, 10);
attribute = H5Acreate(dset_id, "type", atype, dataspace,
    H5P_DEFAULT, H5P_DEFAULT);
H5Awrite(attribute, atype, dat->type);
H5Aclose(attribute);

But this only works if strlen(dat->type) < 10. Otherwise the string describing the attribute is truncated.

Equally, op_decl_dat_hdf5 (op_hdf5.c line 228 onwards) does this:

//read attribute
char typ[10];
H5Aread(attr,atype,typ);
H5Aclose(attr);
H5Sclose(dataspace);
H5Dclose(dset_id);
if(strcmp(typ,type) != 0)
{
  printf("dat.type %s in file %s and type %s do not match\n",typ,file,type);
  exit(2);
}

If the type that was written had strlen >= 10, there is a buffer overflow here, since typ will not contain a trailing NULL and so strcmp(typ, type) will read off the end of the array.

In addition, this is a bad bug because it does not allow us to write and read op_dats whose datatype descriptor has more than 9 characters.

In particular, consider the case of long long data stored in soa format:

op_decl_dat(..., "long long:soa", ...)

strlen("long long:soa") == 13, and so, the attribute written to the HDF5 file will be:

"long long:" (with no trailing null).

When we come to read from the file, this will not match against the long long:soa datatype we want, and we will exit.

I think the correct fix is to write the attribute with strlen(dat->type) and first read the length of the attribute from the hdf5 file before allocating a buffer for it.

Feature Request - C Wrapper for Fortran?

Hello there. I was wondering if its possible to call C based OP2 functions from Fortran? Although it should be possible to call CUDA and C functions from Fortran, I wonder if the magic of OP2 automatic code generation might upset my plan to write a wrapper? Basically, I am writing in Fortran but I don't have the portland cuda compiler. Cheers!

op2.m generates code that does not compile without warnings

This is an incomplete list of warnings issued when compiling the generated *_kernel.{cu,cpp} files:

  • warning: comparison between signed and unsigned integer expressions
  • warning: ignoring #pragma omp parallel
  • warning: unused variable ‘foo’

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.