Git Product home page Git Product logo

gpi-2's People

Contributors

acastanedam avatar mrahn avatar rumach avatar sloede avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpi-2's Issues

general timeout

In most cases the return value of lock_gaspi_tout (..., timeout_ms) is not respected.

Resolution: instead write

if (lock_gaspi_tout (..., timeout_ms))
return GASPI_TIMEOUT

undefined behaviour in gaspi_read|write_list (ib implementation)

The functions gaspi_read_list, gaspi_write_list with and without notification take an unsigned int as element number (const gaspi_number_t num), but within the functions an array with 256 elements is used to set up ibv_post_send.

struct ibv_send_wr swr[256];

In case num > 256 memory beyond the requested one is used and may be results in undefined behaviour.

 for (i = 0; i < num; i++)
    {
      slist[i].addr = (uintptr_t) (gctx->rrmd[segment_id_local[i]][gctx->rank].data.addr + offset_local[i]);
      slist[i].length = size[i];
      slist[i].lkey = ((struct ibv_mr *)gctx->rrmd[segment_id_local[i]][gctx->rank].mr[0])->lkey;

      swr[i].wr.rdma.remote_addr = (gctx->rrmd[segment_id_remote[i]][rank].data.addr + offset_remote[i]);
      swr[i].wr.rdma.rkey = gctx->rrmd[segment_id_remote[i]][rank].rkey[0];
      swr[i].sg_list = &slist[i];
      swr[i].num_sge = 1;
      swr[i].wr_id = rank;
      swr[i].opcode = IBV_WR_RDMA_WRITE;
      swr[i].send_flags = IBV_SEND_SIGNALED;
      if (i == (num - 1))
	swr[i].next = NULL;
      else
	swr[i].next = &swr[i + 1];
}

Failed to modify QP on RHEL 7.5 with OpenFabrics stack

Executing a simple GPI2 Hello-World program fails with

[Rank 0]: Error 22 (Invalid argument) at (.../GPI-2-1.3.0/src/devices/ib/GPI2_IB.c:813):Failed to modify QP (libibverbs)
[Rank 0]: Error at (GPI2_SN.c:648):Failed to read from 0

gaspi_write and gaspi_notify

Hi,
following situation:
I have many gaspi_write's (they address the same target rank) in the same queue(e.g. queue=1) and the rank executes a gaspi_wait-Operation on this queue(queue=1) and after that the rank posts a notify to this (empty) queue(queue=1).
Can it happens that the notify is faster than the write's ?

`gaspi_state_vector_t` has the wrong type

Per GASPI specification gaspi_state_vector_t is defined as:

typedef vector<gaspi_state_t> gaspi_state_vector_t

this does not match the current define in GPI2 where it is a char*

gaspi_proc_ping does not set health state -> barrier_recover_ping test broken

The test barrier_recover_ping does not work and leads to an endless loop till it times out (after 1 hour and only when run through run_tests.sh)

The reason for that is, that gaspi_proc_ping is used, which does not update the health state.

After the 1.3.0 release (in the "next" branch) this is improved by setting the health state if gaspi_sn_command returns GASPI_ERROR. However this does NOT work because the call fails during gaspi_sn_connect_to_rank with a TIMEOUT, not an error. Hence the health state is not set and the broken state never detected.

SLURM

Is there a work around for the gaspi_run script with slurm ??
The ssh connections in the script are very bad for slurm and i need a solution to use e.g srun.

Adhere to standard configure, make, make install workflow

The current custom install.sh does make it difficult to integrate the GPI2 installation into standard workflows.
Please provide proper configure and make scripts instead of encapsulating it into a not fully-configurable custom procedure.

Example: the build must be done on a different machine or from a different user than the install. E.g. make && sudo make install

gaspi_print_error macro may shadow GASPI function with same name

In GPI2_Utility.h there are various macros defined that a) look like real functions and b) shadow official gaspi functions.

Most notably is gaspi_print_error.

This will lead to errors when GPI2_Utility.h is included before GASPI.h and possibly also for GPI2.c where #pragma weak gaspi_print_error = pgaspi_print_error is used.

Using private IB include breaks on RHEL 7.5

Compilation on RHEL 7.5 breaks due to the include of infiniband/driver.h

#include <infiniband/driver.h>

This was a private include and is now even more private and hence not accessible.

See:

/usr/include/infiniband/driver.h was provided by libibverbs-devel. This
header file still exist in upstream rdma-core repo. The header file had been
removed when rebase rdma-core-13-1.el7.

Yes, driver.h does still exist in the rdma-core repo, but it's clearly and
intentionally marked as an internal header now, not to be used by anything
outside the rdma-core tree. (See: libibverbs/CMakeLists.txt in the rdma-core
source tree).

https://bugzilla.redhat.com/show_bug.cgi?id=1464830

Order of segment and group deletion

I have a strange infiniband error on our cluster, of which I am not sure, where the problem is. I can't find anything in the gpi2 source code, but I guess there are some pecularities in infiniband biting me. I have the following code:

  gaspi_group_t groupId;
  gaspi_group_create(&groupId);
  //...
  gaspi_group_commit(groupId, GASPI_BLOCK);
  gaspi_segment_create(0, 268435456, groupId, GASPI_BLOCK, GASPI_ALLOC_DEFAULT);
  gaspi_group_delete(groupId);
  gaspi_segment_delete(0);
  gaspi_group_create(&groupId);
  //...
  gaspi_group_commit(groupId, GASPI_BLOCK);
  gaspi_segment_create(0, 268435456, groupId, GASPI_BLOCK, GASPI_ALLOC_DEFAULT);
  // aso.

This code works for up to 128 processes. With more processes (tried with 512) I get mlx5: [nodename]: got completion with error: and a small dump (but no error number). The program stops somewhere in the second gaspi_segment_create.
If I swap gaspi_group_delete and gaspi_segment_delete, things are working fine so far. Has anyone an idea, what exactly goes wrong here? I could imagine, that it is a good idea not to delete a group, before all segments belonging to that group are deleted (it's not demanded by the GASPI standard though). However, I don't find an actual reason for the issue in the GPI2 source code.

Gaspi_Barrier, Timeout and state vector issue

As pointed out in issue #30 a barrier running in a time out, still doesn't update the state vector at all.
Example application:
state_vec.cpp.txt

output for 8 processes:
time out
0 -> healthy
1 -> healthy
2 -> healthy
3 -> healthy
4 -> healthy
5 -> healthy
6 -> healthy
7 -> healthy

Normally rank 1 shouldn't be healthy.

Kill_Procs test fails as killed process exits with error code

The GASPI command gaspi_proc_kill kills a remote process. However the return code of the killed process is unspecified. In GPI2 the remote process calls exit(-1) so an error code is set.

This results in a test failure when this error code is propagated.

Suggestion: As gaspi_proc_kill is a user request to terminate the process, it should not return an error code. So call exit(0) instead.

gaspi_allreduce error: GASPI state vector report all processes healthy

gaspi_allreduce doesn't affect the state vector in case of an error.
If i use a barrier returning a timeout before allreduce (given example), gaspi_allreduce returns an error. If their is no barrier called before gaspi_allreduce the applications pends forever.
Either way, the state vector doesn't change.

Also, the standard says in case of an error gaspi_allrduce returns GASPI_ERROR not a GPI2 specific error code.

used branch: next

Example:
state_vec_3.cpp.txt

output (8 processes):
2 = 0
3 = 0
4 = 0
5 = 0
6 = 0
7 = 0
0 = 0
time out
0 -> healthy
1 -> healthy
2 -> healthy
3 -> healthy
4 -> healthy
5 -> healthy
6 -> healthy
7 -> healthy

Does gaspi_segment_bind do the right thing, if the segment id is already in use?

The GASPI standard doesn't define (yet), what gaspi_segment_bind should do, if the segment id is already used. Based on GPI2_SEG.c:591 I suppose, that the current implementation just ignores all arguments of the gaspi_segment_bind call and returns GASPI_SUCCESS. Wouldn't it be better to rebind the segment to the new arguments?

Second issue: GPI2_SEG.c:576 returns, if the maximum number of segments is reached. Shouldn't this test come after the test, whether the segment already exisits?

GPI2 signature of gaspi_reduce_operation differs from GASPI specification

Why does the GPI2 signature of gaspi_reduce_operation differs from the GASPI specification?

GASPI-Spec:

gaspi_return_t
 gaspi_reduce_operation (gaspi_const_pointer_t operand_one,
                        gaspi_const_pointer_t operand_two,
                        gaspi_pointer_t result,
                        gaspi_reduce_state_t state,
                        gaspi_timeout_t timeout )

GPI2 impl. :

typedef gaspi_return_t (*gaspi_reduce_operation_t) (gaspi_pointer_t const operand_one,
						      gaspi_pointer_t const operand_two,
						      gaspi_pointer_t const result,
						      gaspi_reduce_state_t const state,
						      const gaspi_number_t num,
						      const gaspi_size_t element_size,
                                                      const gaspi_timeout_t timeout_ms);

what is the limit of GASPI_MAX_MSEGS

The max number of segments can be created is #define GASPI_MAX_MSEGS (32) in file include/GASPI.h
I want to create thousands of segments, can i just modify #define GASPI_MAX_MSEGS (1024)
and what is the limit of GASPI_MAX_MSEGS can be defined ?

Thank you very much

gaspi_segment_bind doesn't count in gaspi_segment_list

The following code fails:

gaspi_segment_id_t data[1];
gaspi_segment_bind(0, somePtr, someSize, 0);
assert(gaspi_segment_list(1, data) == GASPI_SUCCESS);

The problem is, that gaspi_segment_bind doesn't set the "trans" member of the own rank to 1.
gaspi_segment_list then returns GASPI_ERROR, since there is a mismatch between mseg_cnt and the segments, which have set their "trans" members.
A workaround is an additional gaspi_segment_register call on the own rank.
However, the standard says, that a bound segment "...can be accessed locally...".
Thus I'd say, that this is actually a GPI2 bug, which can be fixed by setting the "trans" member in
gaspi_segment_bind and presumably in gaspi_segment_alloc.

GASPI state vector doesn't report anything

Dear GPI maintainers,

I'm playing around with fault tolerance and what GASPI's timeout feature can do to make a program survive a rank failure.

In a test program I can identify which rank died but I need to do it manually. The GASPI state vector doesn't report anything. I attached a test code Uploading test.zip… which shows this behavior. The test program is not watertight when detecting a failure and it does not produce the correct result yet. But it shows that gaspi_state_vec_get() never reports a failure.

I tested it with gpi2/1.3.0 at the Taurus HPC machine at ZIH, TU Dresden.

Regards, Andreas

groups

Are the returned rank numbers sorted in function
gaspi_group_ranks ?

timeout in passive_send

passive_send returns with GASPI_TIMEOUT, if there is no matching reseive in time. However at this time the communication request is already posted.

Proposed resolution: return only with GASPI_TIMEOUT, if the initial locking fails. Otherwise block until the communication request is received by the other side.

Note: If we want to keep the current behavior, we have to reformulate the standard. Moreover we have to deal with a possible out-of-sync problem: while passive_send runs in a timeout, the passive_receive is just called and it will successfully receive the data hence returning GASPI_SUCCESS. Afterwards the processes have different views to this particular communication request.

Faulty behaviour of `gaspi_barrier` in case of rank errors

During investigation of #30 I found the function gaspi_error not conforming to the specs.

The testcase is a barrier call after one process was killed.

Expected: gaspi_barrier returns GASPI_ERROR, gaspi_state_vec_get returns GASPI_CORRUPTED for that rank
Current behavior: gaspi_barrier returns GASPI_TIMEOUT or hangs indefinitely, gaspi_state_vec_get returns GASPI_HEALTHY for all ranks.

Testcode: gpi2_barrier.c.txt

Example output of a run with 5 tasks:

4: Started 5 ranks
4: Got timeout on barrier. Retrying...
4: Still timeout on barrier
4: reported rank 0 as healthy
4: reported rank 1 as healthy
4: reported rank 2 as healthy
4: reported rank 3 as healthy
4: reported rank 4 as healthy

Error on rank 4. Did NOT detect faulty rank.

3: Started 5 ranks
3: Got timeout on barrier. Retrying...
3: Still timeout on barrier
3: reported rank 0 as healthy
3: reported rank 1 as healthy
3: reported rank 2 as healthy
3: reported rank 3 as healthy
3: reported rank 4 as healthy

Error on rank 3. Did NOT detect faulty rank.

2: Started 5 ranks
2: Got timeout on barrier. Retrying...
2: Still timeout on barrier
2: reported rank 0 as healthy
2: reported rank 1 as healthy
2: reported rank 2 as healthy
2: reported rank 3 as healthy
2: reported rank 4 as healthy

Error on rank 2. Did NOT detect faulty rank.

0: Started 5 ranks
0: Got timeout on barrier. Retrying...
0: Still timeout on barrier
0: reported rank 0 as healthy
0: reported rank 1 as healthy
0: reported rank 2 as healthy
0: reported rank 3 as healthy
0: reported rank 4 as healthy

Error on rank 0. Did NOT detect faulty rank.

Problem is that pgaspi_dev_post_group_write at https://github.com/cc-hpc-itwm/GPI-2/blob/v1.3.0/src/GPI2_GRP.c#L551 does not return an error and then https://github.com/cc-hpc-itwm/GPI-2/blob/v1.3.0/src/GPI2_GRP.c#L575 returns the timeout without further error checking.

Limit to low for GASPI_MAX_MSEGS

Why is the limit for GASPI_MAX_MSEGS per default 32?

I don't agree with the explanation given in issue #18. The user should decide on his own how many segments he wants to use and not a default value.

Gaspi(GPI2) only supports one-sided communication between segment ids, here you need more than this number of segments at some point. Especially when using an abstraction on top of GASPI(GPI2).

Small error in GASPI.h documentation

Nothing severe, but you may want to fix it. Compiler warning is reported by an up-to-date clang.

libgpi-prefix/include/GASPI.h:1057:13: warning: parameter 'ticks' not found in the function declaration [-Wdocumentation]
   * @param ticks Output parameter with the time in milliseconds.
            ^~~~~
libgpi-prefix/include/GASPI.h:1057:13: note: did you mean 'wtime'?
   * @param ticks Output parameter with the time in milliseconds.
            ^~~~~
            wtime

Atomic Operation Tests fail

We installed GPI2-1.3.0 on our cluster.
All tests are successful on the cluster partition with the Intel Sandy architecture.
But when I execute the tests on the partition with the Intel Haswell architecture,
the atomic operation tests fail with this output:

atomic operations are not supported yet
Assertion failed in fetch_add.c[25]: Return -1: general error

On the cluster is installed:

  • gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)
  • libibverbs-1.1.8-1.ofed3.18.Bull.1.0.el6.20150922.x86_64

Have you some advises or experience with this error?

Support for shared libraries

The support for building shared libraries is explicitly disabled in the configure script, with a message that doing so is not supported at the moment. Are there any known issues with using the shared libraries or is this a historic artifact?

GPI-2/configure.ac.in

Lines 63 to 66 in c52bf44

LT_INIT([disable-shared static])
if test $enable_shared = yes; then
AC_MSG_ERROR([GPI-shared libraries are not supported at the moment])
fi

tutorial make failed

when make in tutorial/code/basic
the following error will occur

/home/zork/loclib/GPI2/lib64/libGPI2.a(GPI2_IB.o): In function `gaspi_init_ib_core':
GPI2_IB.c:(.text+0x188): undefined reference to `ibv_get_device_list'
GPI2_IB.c:(.text+0x1de): undefined reference to `ibv_open_device'
GPI2_IB.c:(.text+0x1f6): undefined reference to `ibv_create_comp_channel'
GPI2_IB.c:(.text+0x217): undefined reference to `ibv_query_device'
GPI2_IB.c:(.text+0x26e): undefined reference to `ibv_query_port'
GPI2_IB.c:(.text+0x519): undefined reference to `ibv_alloc_pd'
GPI2_IB.c:(.text+0x595): undefined reference to `ibv_reg_mr'
GPI2_IB.c:(.text+0x5e2): undefined reference to `ibv_create_srq'
GPI2_IB.c:(.text+0x60b): undefined reference to `ibv_create_cq'
GPI2_IB.c:(.text+0x634): undefined reference to `ibv_create_cq'
GPI2_IB.c:(.text+0x65d): undefined reference to `ibv_create_cq'
GPI2_IB.c:(.text+0x68b): undefined reference to `ibv_create_cq'
GPI2_IB.c:(.text+0x6e5): undefined reference to `ibv_create_cq'

and pthread lib is missing.

this can be fixed by change the order of GPI2 and ibverbs in Makefile

LIB += GPI2
LIB += ibverbs
LIB += pthread
LIB += m

Why all processes except process 0 are allowed to die?

All processes except 0 are allowed to die. But, if the process with id 0 dies all other processes will be killed.
It's a strange behaviour in terms of fault tolerance.

used branch: next

Example:
state_vec_2.cpp.txt

output 8 processes:
/bin/sh: line 1: 8544 Killed ...
/bin/sh: line 1: 8550 Killed ...
/bin/sh: line 1: 8549 Killed ...
/bin/sh: line 1: 8548 Killed ...
/bin/sh: line 1: 8545 Killed ...
/bin/sh: line 1: 8547 Killed ...
/bin/sh: line 1: 8546 Killed ...

`gaspi_state_t` has the wrong type

According to the GASPI spec the type gaspi_state_t is defined as:

Gaspi provides a predefined type to describe the state of a remote Gaspi process, which is the gaspi_state_t type. gaspi_state_t can have one of two values:
GASPI_STATE_HEALTHY implies that the remote Gaspi process is healthy, i. e. communication is possible.
GASPI_STATE_CORRUPT means that the remote Gaspi process is corrupted, i. e. there is no communication possible.

in GPI2 it is defined as void*. Instead there is the type gaspi_qp_state_t which is non-standard but its definition matches gaspi_state_t.

OpenFabric Alliance

I found the OpenFabric alliance on github and they provide a generic communication library called libfabric.

For more information:
@ofiwg
http://ofiwg.github.io/libfabric/

I think this library is a good thing to improve portability of GPI2.
Is it possible to implement another device for the libfabric ?
By using this library GPI2 is able to run on upcoming systems with omni-path.

Cancel configuration if MPI is requested but not found.

If configure --with_mpi is called, then the configuration should exit with error, if MPI could not be found. The old install script did this.
Also please re-add include64 in the search for includes here (was also in the old install script):

inc_path=include

My current mpi.m4 file looks like the attached txt file (I can't attach an m4 file type).

mpi.txt

Implicit declaration of function warnings in TCP build

src/devices/tcp/tcp_device.c:203:16: warning: implicit declaration of function 'gaspi_sn_connect2port' is invalid in C99 [-Wimplicit-function-declaration]
handle = gaspi_sn_connect2port("localhost", TCP_DEV_PORT + glb_gaspi_ctx.localSocket, CONN_TIMEOUT);
^
src/devices/tcp/tcp_device.c:344:3: warning: implicit declaration of function 'gaspi_sn_set_non_blocking' is invalid in C99 [-Wimplicit-function-declaration]
gaspi_sn_set_non_blocking(conn_sock);
^

v1.4.0 Slurm Support | Intended usage

Dear GPI2 community,

is there any documentation on the (intended) usage of GPI2 and Slurm? As I can see, gaspi_run executes "srun ..." but the machinefile needs to be provided by the user. This can be automated, right?

Best regards!

Library does not obey a clean namespace

The libGPi2 library has many exported symbols, which don't have a distinct prefix like 'gaspi_' or 'pgaspi_'. Here is the list, which I get for an ethernet build:

0000000000000130 T list_clear
0000000000000000 T list_insert
00000000000000a0 T list_remove
0000000000000000 B cq_lock
0000000000000000 T insert_ringbuffer
0000000000000070 T remove_ringbuffer
0000000000000004 B cq_ref_counter
0000000000000030 B delayedList
0000000000000000 B qs_ref_counter
0000000000000048 B rank_state
0000000000000010 B recvList
0000000000000480 T tcp_dev_connect_to
0000000000000260 T tcp_dev_create_cq
00000000000001a0 T tcp_dev_create_passive_channel
00000000000003b0 T tcp_dev_create_queue
0000000000000370 T tcp_dev_destroy_cq
0000000000000230 T tcp_dev_destroy_passive_channel
0000000000000450 T tcp_dev_destroy_queue
0000000000000050 B tcp_dev_init
0000000000000200 T tcp_dev_is_valid_state
0000000000000670 T tcp_dev_return_wc
00000000000006f0 T tcp_dev_stop_device
0000000000000790 T tcp_virt_dev
0000000000000000 D valid_state
0000000000000010 T _gaspi_sample_cpu_freq
0000000000000004 D __gaspi_thread_tid
0000000000000000 D __gaspi_thread_tnc
00000000000001b0 T opMaxDoubleGASPI
0000000000000120 T opMaxFloatGASPI
0000000000000030 T opMaxIntGASPI
0000000000000240 T opMaxLongGASPI
00000000000000c0 T opMaxUIntGASPI
00000000000002d0 T opMaxULongGASPI
0000000000000180 T opMinDoubleGASPI
00000000000000f0 T opMinFloatGASPI
0000000000000000 T opMinIntGASPI
0000000000000210 T opMinLongGASPI
0000000000000090 T opMinUIntGASPI
00000000000002a0 T opMinULongGASPI
00000000000001e0 T opSumDoubleGASPI
0000000000000150 T opSumFloatGASPI
0000000000000060 T opSumIntGASPI
0000000000000270 T opSumLongGASPI
0000000000000330 T opSumUIntGASPI
0000000000000300 T opSumULongGASPI
0000000000000000 D glb_gaspi_cfg

Wrong signature of gaspi_state_vec_get

GASPI requires gaspi_return_t gaspi_state_vec_get ( gaspi_state_vector_t *state_vector ) but GPI2 has it as gaspi_return_t gaspi_state_vec_get ( gaspi_state_vector_t state_vector )

Hence for GPI2 the user has to manage the memory which is filled by GPI2 but the GASPI spec says that the vector is returned instead (memory not managed by the user)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.