Git Product home page Git Product logo

ga's People

Contributors

abagusetty avatar abhinavvishnu avatar ajaypanyala avatar ajmay81 avatar bernhold avatar bjpalmer avatar cabe1980 avatar calccrypto avatar callum-pe avatar djbaxter avatar dmejiar avatar dsolovyev avatar e-kwsm avatar edoapra avatar gsthomas avatar hjjvandam avatar ipdrm16 avatar jeffhammond avatar jonnysq avatar keipertk avatar landwehrj avatar lizutah avatar marcinz avatar mshiryaev avatar nitinehpc avatar pjknowles avatar sriramkmoorthy avatar tjstavenger-pnnl avatar twindus avatar wadejong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ga's Issues

verify if asserts contain code fragments

There are some cases where the assert() macro is used incorrectly, containing code that must always execute. For example, 0122f97 fixed such an issue. We need to grep through the entire code base to make sure we don't do this elsewhere.

KNL: icc -xmic-avx512 breaks src-mpi-pr/comex.c

I have observed this issue on Cray KNLs when the module craype-mic-knl is loaded. craype-mic-knl triggers the -xmic-avx512 icc option that seems to generate erroneous code out of src-mpi-pr/comex.c
The problem can be reproduced with any NWChem run. I have not tried comex/ga tests, which one should I try?

#ifdef ENABLE_EISPACK should be #if ENABLE_EISPACK

rsg.F and ga_diag_seq.F use #ifdef ENABLE_EISPACK, and since config.h unconditionally defines ENABLE_EISPACK to either 0 or 1, we're stuck using the old EISPACK code. Update these two source files to use #if ENABLE_EISPACK instead.

autogen.sh fails when only automake needs to be built

The autogen.sh fails to build the right set of autotools when only automake needs to be built, while m4, libtool and autoconf are already at the required version. I think that the script fails since the automake build has a hardwired reference to ../autotools/share/aclocal that should come from libtool, but libtool has not been built. A simple solution would be to build libtool even though it has the required version already.
Here are some snippets from the attached log file
+ M4_VERSION=1.4.17
+ LIBTOOL_VERSION=2.4.6
+ AUTOCONF_VERSION=2.69

../ga-5-6-1/autotools/share/aclocal': No such file or directory

autogen.log.txt

abstract_ops.h thread safety

abstract_ops.h has two static variables shared between operators.

static double __elem_op_var;
static double __elem_op_var2;

This is beside the fact abstract_ops.h is difficult to use. It's trying to solve the problem of copy-paste coding for the various GA data types in math operations, e.g., scan_add, elem_multiply.

K&R syntax is obsolete, use ANSI

There are a few places, e.g. global/src/DP.c, where K&R style function declarations are still in use.

For example:

// K&R syntax
int foo(a, p) 
    int a; 
    char *p; 
{ 
    return 0; 
}

// ANSI syntax
int foo(int a, char *p) 
{ 
    return 0; 
}

Clean these up. Consider using gcc warnings to locate them, but it might not be bullet proof.
-Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes or -ansi -pedantic.

CMake not building MA fortran wrappers

I could not find any way to build the the MA fortran wrappers in the current CMake infrastructure.
For example, libga.a contains the symbol MA_push_stack, but not MA_push_stack_

Disable Fortran not working

Hi,

With the new (5.6) GA there seems to be a small glitch when switching off Fortran, e.g.:

./configure --prefix=${HOME}/ga-install --disable-f77 MPICC=mpicc MPICXX=mpicxx

worked with 5.5 but with 5.6 this gives:

configure: error: conditional "F77_INTEL_NO_INLINE" was never defined.
Usually this means the macro was only invoked conditionally.

The F77_INTEL_NO_INLINE test seems to be new to 5.6, perhaps it just needs a default value for --disable-f77 case?

Best wishes,

Andy

gatscat old/new performance and correctness, gatscat alloc thread-safety

The gather/scatter/scatteracc code has undergone a few revisions over the project lifetime. The latest addition of an alloc function is not thread safe. These routines need additional review and testing in light of certain optimizations made to ARMCI contiguous/strided checks that caused gatscat code to fail when it shouldn't.

do not use prod reductions to achieve logical consensus

Most of the uses of the reduction operator * are doing "logical and" on 0 and 1. We should instead use something that maps down to MPI_LAND.

I'm going to add some new collective ops and map at least the GA internal collectives to them where appropriate.

cca/ga_cca_classic/overload.cxx:  GA_Lgop(&isEqual, 1, (char *)"*");
ga++/src/overload.cc:  GA_Lgop(&isEqual, 1, (char *)"*");
global/src/base.c:        pnga_pgroup_gop(p_handle,pnga_type_f2c(MT_F_INT), &status, 1, "*");
global/src/base.c:        pnga_gop(pnga_type_f2c(MT_F_INT), &status, 1, "*");
global/src/base.c:         pnga_gop(pnga_type_f2c(MT_F_INT), &status, 1, "*");
global/src/base.c:      pnga_pgroup_gop(grp_id, pnga_type_f2c(MT_F_INT), &status, 1, "*");
global/src/base.c:      pnga_gop(pnga_type_f2c(MT_F_INT), &status, 1, "*");
global/src/elem_alg.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/elem_alg.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/elem_alg.c:     pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/elem_alg.c:  pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/global.npatch.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/global.npatch.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible_a, 1, "*");
global/src/global.npatch.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible_b, 1, "*");
global/src/matrix.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/matrix.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/src/matrix.c:    pnga_gop(pnga_type_f2c(MT_F_INT), &compatible, 1, "*");
global/testing/unit-tests/ga_dgop.c:  GA_Dgop(x, n,"*");
global/testing/unit-tests/ga_lgop.c:  GA_Lgop(x, n, "*");
gparrays/testing/testc.c:  GA_Igop(&idx,1,"*");
gparrays/testing/testc.c:  GA_Igop(&idx,1,"*");

Comex issue with NGA_Put64

When using NGA_Put64 there are problems if the values are over the 32-bit limit, i.e. they cannot be represented by int, .eg. see the seg fault below. Even though _my_memcpy takes a size_t n, it's passed int bytes so at least by that point we've lost the 64-bit information. Everything works fine for a non-Comex port, eg. --with-sockets.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff54fb254 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
(gdb) where
#0  0x00007ffff54fb254 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
#1  0x000000000395f222 in _my_memcpy (dest=0x7ffdb6c6f010, src=0x7ffed4e23010, n=18446744071562067968) at src-mpi/comex.c:139
#2  0x000000000395ff72 in _put_nbi (src=0x7ffed4e23010, dst=0x7ffdb6c6f010, bytes=-2147483648, proc=0) at src-mpi/comex.c:644
#3  0x000000000395fe48 in comex_put (src=0x7ffed4e23010, dst=0x7ffdb6c6f010, bytes=-2147483648, proc=0, group=0) at src-mpi/comex.c:613
#4  0x000000000395b94e in PARMCI_PutS (src_ptr=0x7ffed4e23010, src_stride_arr=0x7fffffffc824, dst_ptr=0x7ffdb6c6f010, dst_stride_arr=0x7fffffffc844, count=0x7fffffffc804, stride_levels=0, proc=0) at src-armci/armci.c:484
#5  0x000000000395e889 in ARMCI_PutS (src_ptr=0x7ffed4e23010, src_stride_arr=0x7fffffffc824, dst_ptr=0x7ffdb6c6f010, dst_stride_arr=0x7fffffffc844, count=0x7fffffffc804, stride_levels=0, proc=0) at src-armci/capi.c:377
#6  0x00000000038d8a76 in ngai_puts (loc_base_ptr=0x7ffed4e23010 "", pbuf=0x7ffed4e23010 "", stride_loc=0x7fffffffc824, prem=0x7ffdb6c6f010 "", stride_rem=0x7fffffffc844, count=0x7fffffffc804, nstrides=0, proc=0, field_off=0, field_size=-1, type_size=8)
at global/src/onesided.c:411
#7  0x00000000038db979 in ngai_put_common (g_a=-1000, lo=0x7fffffffcd90, hi=0x7fffffffcd50, buf=0x7ffed4e23010, ld=0x7fffffffcd10, field_off=0, field_size=-1, nbhandle=0x0) at global/src/onesided.c:708
#8  0x00000000038dea0e in pnga_put (g_a=-1000, lo=0x7fffffffcd90, hi=0x7fffffffcd50, buf=0x7ffed4e23010, ld=0x7fffffffcd10) at global/src/onesided.c:1265
#9  0x00000000038674df in NGA_Put64 (g_a=-1000, lo=0x30901270, hi=0x309017b0, buf=0x7ffed4e23010, ld=0x7fffffffce30) at global/src/capi.c:1657

GA replaces random() unconditionally

In Makefile.am, we see

##############################################################################
# compat
#
# Although the compat directory houses replacements for missing or erroneous
# standard C functions and such sources are conditionally compiled based on
# results from configure tests, without the "random" implementation the
# m4-generated tests always fail for scatter and copy_patch.
libga_la_SOURCES += compat/random.c

First of all, I have no idea if we even still need to use the autotool functionality of LIBOBJS. See https://www.gnu.org/software/automake/manual/html_node/LIBOBJS.html for details.

The main issue here is that random.c is unconditionally compiled. Worse yet, it appears that if we don't override the system random() and srandom() we somehow break some tests. Our compat/random.c appears to be a permissively-licensed copy of BSD's random() from 1983...?

@edoapra, would it be possible to evaluate how this replacing of system provided random() affects NWChem? I would love to remove from GA such strange hacks. A related issue is that GA provides a "drand" implementation for fortran, unconditionally. So this would replace perhaps any drand() functions provided by ifort, for example. Our drand() is a wrapper around the C random() -- the same random() we already replace...

comex_fence_proc() is no-op in MT, PT, PR

TravisCI build was showing MPI-MT port to fail, but only occasionally. Upon further review,

int comex_fence_proc(int proc, comex_group_t group)
{
#if DEBUG
    printf("[%d] comex_fence_proc(proc=%d, group=%d)\n",
            g_state.rank, proc, group);
#endif

    comex_wait_all(COMEX_GROUP_WORLD);

    return COMEX_SUCCESS;
}

The call to comex_wait_all() is effectively a no-op in this case. No fencing message is initiated, so there is nothing to wait on. This is repeated in the PT and PR implementations. I wonder if we meant to call comex_fence_all() instead? It's potentially a heavy hammer, but it would work.

process groups sometimes fail for MPI-PT port

In pgtest.x we see

> Checking accumulate ... 

  disjoint ga_acc is OK


  overlapping ga_acc is OK


> Checking add ...
[3] ../../comex/src-mpi-pt/comex.c:2387: _put_packed_handler: Assertion `reg_entry' failed[3] Received an Error in Communication: (-1) comex_assert_fail
application called MPI_Abort(comm=0x84000002, -1) - process 3

In ghosts.x we see

 using                     2  process(es)
 Value of pdims(                    1 ) is                     2
 Value of pdims(                    2 ) is                     1
map( 1) =     1
map( 2) =  1001
map( 3) =     1
 *
 * Global array creation was successful
 *
[3] ../../comex/src-mpi-pt/comex.c:2542: _get_packed_handler: Assertion `reg_entry' failed[3] Received an Error in Communication: (-1) comex_assert_fail
application called MPI_Abort(comm=0x84000002, -1) - process 3

ProcListPerm not thread safe

global/src/onesided.c uses a global variable ProcListPerm to locally permute MPI ranks at the initiator of put/get/acc calls in order to avoid contention by always servicing targets in a monotonically increasing order.

Make this thread safe with appropriate malloc()/free() within the scope of the functions.

Review implementation of GA_Sync

The current implementation of GA_Sync relies on pnga_pgroup_sync. If the group is not the world group, then the call loops over all processes in the group and calls ARMCI_Fence(group_id, iproc). The current implementation of this function in the MPI3 port is to call MPI_Win_flush(proc, win) on all windows associated with the world group. This is wrong, but I don't think there is any way to implement the ARMCI_Fence operation using MPI RMA that doesn't require an order P data structure. At any rate, I think this implementation is not the way to go, since what we want to do is flush all processes in the group. Unfortunately, we don't seem to have something like ARMCI_FenceGroup. This could be easily implemented in MPI RMA but may cause problems with the other ports. Since there is a comex_fence_all function, it should be easy to implement this operation for any of the MPI based ports, but we may run into problems with the existing IB port.

OFI port doesn't build on OSX 10.12 with clang, other warnings

@mshiryaev

I added the OFI port to the Travis CI testing. On my macbook running OSX 10.12, I was unable to build the ComEx/OFI port using clang. It can't compile the nested function. This is important to fix if you would take a look.

../../comex/src-ofi/comex.c:2594:5: error: function definition is not allowed
      here
    {
    ^

There are also a significant number of warnings due to the deprecated syscall() on OSX 10.12 Here's an example:

../../comex/src-ofi/comex.c:2480:13: warning: 'syscall' is deprecated: first
      deprecated in macOS 10.12 - syscall(2) is unsupported; please switch to a
      supported interface. For SYS_kdebug_trace use kdebug_signpost().
      [-Wdeprecated-declarations]
            COMEX_OFI_LOG(DEBUG, " %d: count = %d, stride: %d\n", i, cou...
            ^

I needed to add -Wno-deprecated-declarations to my CFLAGS to silence those warnings.

I committed cc4d707 to reduce some of the warnings having to do with printf() format mismatches.

In any case, the nested function will need to be addressed. Thanks.

_ga_map not thread safe

Integer *_ga_map defined in base.h is not thread safe. It is used in the following files:

  • global/src/base.c
  • global/src/base.h
  • global/src/ghosts.c
  • global/src/onesided.c

It is a convenience variable for calls to pnga_locate_region so that heap memory is allocated only once during GA_Init(). Recommend locally calling malloc() as needed instead of sharing the allocation.

Must also update any macros that assume _ga_map is globally available.

global/src/nbutil.c not thread safe

We need to evaluate the thread safety of the nbutil.c file and associated routines. Non-blocking functionality is important to GA users. Currently static linked lists are used for managing non-blocking handles and shared among various static and non-static functions.

rename MPI_Check

MPI_Check violates the MPI standard because it uses the reserved MPI_ namespace.

This does not break anything in practice and thus is low priority, but it is also a trivial fix.

libcomex missing optional BLAS dependency

$(BLAS_LIBS) is not added conditionally or otherwise to the LIBADD automake variable for libcomex. This should be done conditionally based on whether an external BLAS is being used. Otherwise, the MKL libraries likely won't properly load when using shared libraries.

GA Style Guide

We should create a style guide for GA programming that defines consistent naming conventions for
internal functions that are not used outside of the GA library and macros and try and get these naming conventions implemented in the code. We also should clean up the different memory allocators inside GA and get them on a consistent footing.

Previous releases

Would it be possible to add tags or branches for the previous releases available in the home page: 4.3, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5?

matmul.c thread safety evaluation

There are a few static function variables in matmul.c.

static short int CYCLIC_DISTR_OPT_FLAG  = SET;
static short int CONTIG_CHUNKS_OPT_FLAG = SET;
static short int DIRECT_ACCESS_OPT_FLAG = SET;

GA handle specific GA_Fence

NWChemEx has requested a feature in which it would be possible to execute GA_Fence, which would take a GA handle as an argument.

Feature requested by Robert Harrison

remove use of GA_PUSH_NAME, GA_POP_NAME

Not only is GA_PUSH_NAME, GA_POP_NAME, etc. not thread safe, it is not needed for modern debugging where we have access to stack unwinding et al. Remove all use and associated variables.

NWChem failure on KNL with MPI-PR

Commit 63b5c76 for comex/src-armci/armci.c causes NWChem to generate erroneous results.
The errors shows on KNL using Intel MPI and mpirun (oddly enough, jobs started with SLURM srun don't have this issue). It's enough to use two nodes with a total of four processes.
It can be reproduced on cascade KNL nodes
One more data point to the Intel MPI behavior: if I switch from the default DAPL network fabric to the OFA network fabric, the error vanishes (this is consistent with what I discovered last week with MPI3 ...).
export I_MPI_FABRICS=shm:ofa
export I_MPI_OFA_PACKET_SIZE=2048
export I_MPI_OFA_NUM_RDMA_CONNECTIONS=-1
export I_MPI_OFA_SWITCHING_TO_RDMA 16

I wonder if we should discourage the usage of DAPL with Intel MPI ... any comment?

comex with BLAS dependency fails to link if fortran disabled in GA

If --disable-f77 is given to configure or if the test for a fortran compiler fails, and if comex detects a BLAS library, then the BLAS dependency isn't passed on to the GA linker. You get many linker errors such as

/home/username/ga-git/bld_nofort/comex/../../comex/src-common/acc.h:110: undefined reference to `daxpy_'

The make flags target doesn't show a dependency on BLAS either.

# =========================================================================== 
F77="mpif90"
CC="mpicc"
# Suggested compiler/linker options are as follows.
# GA libraries are installed in /home/username/ga-git/bld_nofort/lib
# GA headers are installed in /home/username/ga-git/bld_nofort/include
#
CPPFLAGS="-I/home/username/ga-git/bld_nofort/include"
#
LDFLAGS="-L/home/username/ga-git/bld_nofort/lib"
#
# For Fortran Programs: 
FFLAGS=""
LIBS="-lga"
#
# For C Programs: 
CFLAGS=""
LIBS="-lga"
# =========================================================================== 

review GA_Fence() et al for thread safety

The premise of calling GA_Fence_init() and later ending the fence with GA_Fence() is by design not thread safe since state is stored globally between these functions.

Do we consider an API change where we return a handle?

review and remove dead preprocessor symbols and associated code

Here is the alphabetical candidate list of unused preprocessor symbols. Remove dead symbols and their associated code blocks where possible.

  • ARMCI_COLLECTIVES
  • AVOID_MA_STORAGE
  • BYTE_ADDRESSABLE_MEMORY
  • CHECK_MA
  • CHECK_MA_ALGN
  • COMPACT_SCALAPACK
  • CRAY_T3D
  • ENABLE_CHECKPOINT
  • ENABLE_PROFILE
  • ENABLE_TRACE
  • GA_CREATE_INDEF
  • GA_ELEM_PADDING
  • NEC
  • NO_GA_STATS
  • PROFILE_OLD
  • PVM
  • STATBUF
  • SUN
  • UPDATE_SAMENODE_GHOSTS_FIRST
  • USE_GATSCAT_NEW
  • USE_MP_NORTHSOUTH
  • __CRAYX1_PRAGMA

deprecate TCGMSG4/5 and PVM, keep TCGMSG-MPI

We should assume at minimum an MPI runtime is linked in and in use.

TCGMSG4/5 are obsolete. They should be removed from the code base including any preprocessor symbols. TCGMSG4 uses 'pfiles' for running in parallel. This complicates the Makefile test suite and should also be removed.

TCGMSG-MPI is the MPI compatibility layer and is always compiled. This has been the transition path for many years now. For example, NWChem can continue to use PBEGIN() as part of TCGMSG-MPI as needed.

There are a few lingering references to PVM that should be removed.

These changes would also let us remove dead code where the preprocessor symbol MSG_COMMS_MPI is used throughout the code.

Comex OpenIB missing library symbol

If I build GA with --with-ofa and then try to link against it I get:

lib/libarmci.a(armci.o): In function `PARMCI_NbAccS':
armci.c:(.text+0x7b4): undefined reference to `comex_nbacc'

It seems the symbol just doesn't get put into the library:

> nm ga-mvapich2/lib/libarmci.a | grep -i comex_nbacc
             U comex_nbacc
             U comex_nbaccs
             U comex_nbaccv
00000000000004c0 T comex_nbaccs
0000000000000270 T comex_nbaccv
> nm ga-mvapich2/lib/libcomex.a | grep -i comex_nbacc
00000000000004c0 T comex_nbaccs
0000000000000270 T comex_nbaccv

Everything works fine if I instead use --with-openib, I just wanted to compare if there was any difference in performance between the two, is the --with-openib still the recommended option?

global.nalg.c uses static work arrays, dead code, and a bug

/* work arrays used in all routines */
static Integer dims[MAXDIM], ld[MAXDIM-1];
static Integer lo[MAXDIM],hi[MAXDIM];
static Integer one_arr[MAXDIM]={1,1,1,1,1,1,1};

Move these work array into the functions.

On a side note, snga_copy_old() might be dead code.
Bug found in snga_local_transpose() where it uses an int instead of an Integer type.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.