Git Product home page Git Product logo

spindle's People

Contributors

dongahn avatar gonsie avatar grondo avatar jjhursey avatar jolivier23 avatar junghans avatar mcfadden8 avatar mplegendre avatar tgamblin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spindle's Issues

Update repo topics

A new page on the LLNL software portal (https://software.llnl.gov/radiuss/) will soon dynamically pull in RADIUSS repos. To achieve this, we need RADIUSS-related repos outside the LLNL org to be tagged with relevant topics. See LLNL/llnl.github.io#17, LLNL/llnl.github.io#151, & https://github.com/LLNL/llnl.github.io/blob/new-home-page/radiuss/README.md for additional context.

For Spindle, please add performance and radiuss.

Also, you may want to update the computation.llnl.gov in README to https://computation.llnl.gov/projects/spindle/.

not-found: command not found

I'm trying to install spindle and the make is failing with:

/bin/sh: not-found: command not found
make[2]: *** [libfuncdict.so] Error 127
make[2]: Leaving directory `/spindle/testsuite'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/spindle'
make: *** [all] Error 2

Is this something provided by an mpi library? I hadn't installed one in the container yet (is this required for spindle, or does it help with other kinds of loads outside of MPI?)

Investigate using mmap to remap global executables to local ones.

This feature is currently available by running with --debug=yes. It is also leaving the first page of the global file mapped while the remaining pages are mapped to the local file. During testing, determine if the entire local file can be used.

Also, determine whether core dumping will work as expected with the text and data remapped in this way.

Manual and spack installation testsuite problems looking for libfunctdict.so

CentOS 7.6 on Intel Westmere

After building and installing LaunchMON and then trying to build Spindle from the current sources via git clone from their current GitHub source locations, I encounter the following problem in the "make" step. It appears to be having trouble finding and using libfuncdict.so in the testsuite area:

Making all in testsuite
make[2]: Entering directory `/root/spindle-test/Spindle/testsuite'
  CCLD     libfuncdict.so
/bin/sh: not-found: command not found
make[2]: *** [libfuncdict.so] Error 127
make[2]: Leaving directory `/root/spindle-test/Spindle/testsuite'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/spindle-test/Spindle'
make: *** [all] Error 2

Separately installing via spack, it seems to hang during the build and if I look for recently modified files, I see the following:

# find /tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/ -mmin -10
/tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite
/tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite/libtest4000.so

Checking the shared library that is open, it appears also to lack libfunctdict.so:

# ldd /tmp/root/spack-stage/spack-stage-spindle-0.8.1-udgnp63afjbx3uj5nz2k5qo7237kx442/spack-src/testsuite/libtest4000.so
	linux-vdso.so.1 =>  (0x00007ffd0472d000)
	libfuncdict.so => not found
	libc.so.6 => /lib64/libc.so.6 (0x00007f993a6e5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f993afb3000)

Compiling on x86 has missing pthread definitions

/scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/fe/hostbin/launch_hostbin.cc:297: undefined reference to pthread_join' ../hostbin/.libs/libhostbin.a(libhostbin_la-launch_hostbin.o): In function IOThread::~IOThread()':
/scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/fe/hostbin/launch_hostbin.cc:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status

Can I run SPINDLE with OpenMPI launcher without MPIR?

I have question about SPINDLE with OpenMPI launcher.

Is there any way to use SPINDLE with OpenMPI launcher without MPIR?
For example, can SPINDLE run with PMIx instead of MPIR?

If there is no way currently, is there any plan to support PMIx?

Spindle could not connect to session

I'm getting errors in testing and attempted usage that Spindle cannot connect to some session. I'm installing as follows:

./configure --with-munge-dir=/etc/munge --enable-sec-munge --with-slurm-dir=/etc/slurm --with-testrm=slurm
make
make install

And I've tried that with both slurm and openmpi as the "testrm" And then I make the tests

cd testsuite
make
./runTests

but no matter what I do (using the slurm or openmpi template, both of which I have) I see this error:

Running: ./run_driver --partial --session
ERROR: Spindle could not connect to session tn2VYQ

I saw this same error in trying to just use spindle so I've gone back to the tests to debug. Note that I do have a /tmp area:

 ls /tmp/
ccFjQGLR.s  ks-script-eC059Y  spin.kT6PPu  spin.tn2VYQ  spin.Un7RTL  yum.log

Update: I think it could possibly be that they need to see the same /tmp area - so I'm rebuilding the containers with a shared /tmp area and will report back.

Support for ppc64le

Missing support for ppc64le (Power8 and later) processors. Architectures ppc64 and x86_64 are currently supported.

Are there any plans on adding support for this architecture in the future?

$ORIGIN in $RPATH in nested dependency is not handled correctly

When I try SPINDLE, I found that $ORIGIN in $RPATH in nested dependency is not handled correctly and the process cannot load some libraries.

Example: When my python script on my environment imports matplotlib,

  1. matplotlib requires /path/to/lib/python2.7/site-packages/numpy/core/multiarray.so
  2. Spindle create cache of multiarray.so as /tmp/spindle.PIDNUM/b0-_path_to_lib_python2.7_site-packages_numpy_core_multiarray.so
  3. This multiarray.so requires $ORIGIN/../.libs/tls/x86_64/libopenblasp.so

In this case, $ORIGIN/../.libs/tls/x86_64/libopenblasp.so should be expands as /path/to/lib/python2.7/site-packages/numpy/core/../.libs/libopenblasp.so.
However, SPINDLE expands as /tmp/spindle.PIDNUM/../.libs/tls/x86_64/libopenblasp.so.

i.e. SPINDLE expands $ORIGIN as /tmp/spindle.PIDNUM/ instead of /path/to/lib/python2.7/site-packages/numpy/core/

As a result, the process cannot load multiarray.so.

This issue may be similar to #17, but current SPINDLE runs with --debug=yes in default.

Missing mpi.h in testsuite and request for clarification on MPI variant use with Spindle

A plain spack installation or manual installation without an active MPI variant defined throws an error with a missing mpi.h in the testsuite, as below:

     352    make[3]: Entering directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-av65uymhbjk5xlot4r7o7zrdplcrathu/spack-src/testsuite'
     353      CC     test_driver-test_driver.o
     354      CC     test_driver_libs-test_driver.o
  >> 355    test_driver.c:17:10: fatal error: mpi.h: No such file or directory
     356     #include <mpi.h>
     357              ^~~~~~~
     358    compilation terminated.
  >> 359    test_driver.c:17:10: fatal error: mpi.h: No such file or directory
     360     #include <mpi.h>
     361              ^~~~~~~
     362    compilation terminated.
     363    make[3]: *** [Makefile:340: test_driver-test_driver.o] Error 1
     364    make[3]: *** Waiting for unfinished jobs....
     365    make[3]: *** [Makefile:356: test_driver_libs-test_driver.o] Error 1

Does Spindle require a specific MPI package to be set up to address the missing mpi.h and if so, is a separate Spindle instance required for each MPI variant to be used? We have many MPI variants in use, of course, so the latter would definitely be a hassle to use, but I suspect I am missing something obvious here.

Segmentation fault with BIND_NOW executable

Overview

The spindle with application executable built with BIND_NOW option occur segmentation fault. I saw the fault on a x86 cluster and an aarch64 cluster.

Reproduce steps

I confirmed the following reproduce steps on the x86 cluster.

The linker version in x86 cluster.

$ LC_ALL=C ldd --version
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
  1. I downloaded v0.12 from https://github.com/hpc/Spindle/releases/tag/v0.12 and built it.

  2. Prepare the simple application built with BIND_NOW and run with Spindle like the following.

    $ cat hello.c

#include <stdio.h>

int main (int argc, char* argv[])
{
  printf ("Hello world!\n");
  return 0;
}
$ gcc -Wl,-z,now -o hello_bind_now hello.c
SPINDLE_DEBUG=3 TMPDIR='/tmp' spindle --location='/tmp' mpiexec -np 1 spindlemarker $(pwd)/hello_bind_now
<Aug 31 16:19:45> <Launchmon> (INFO): The RM process has just been forked and exec'ed.
<Aug 31 16:19:45> <Launchmon> (INFO): Just continued the RM process out of the first trap

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 247311 RUNNING AT 10.xx.yy.zz
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

Expected results

Without BIND_NOW option, the application can run with Spindle.

$ gcc  -o hello hello.c
SPINDLE_DEBUG=3 TMPDIR='/tmp' spindle --location='/tmp' mpiexec -np 1 spindlemarker $(pwd)/hello
<Aug 31 16:20:26> <Launchmon> (INFO): The RM process has just been forked and exec'ed.
<Aug 31 16:20:26> <Launchmon> (INFO): Just continued the RM process out of the first trap
Hello world!

Detail

In the debug output, the SPINDLE client looks stop with the following log.

[Client.0.252100@auditclient_common.c:92] la_objopen - la_objopen(): loading /lib64/libc.so.6, link_map = 0x2b60c23859c8, lmid = LM_ID_BASE, cookie = 0x2b60c2385e30
[Client.0.252100@auditclient_common.c:116] la_activity - la_activity(): cookie = 0x2b60c25685c0; flag = LA_ACT_CONSISTENT
[[email protected]:30] remove_lib_rogot - Checking whether /lib64/libc.so.6 has R GOT
[[email protected]:41] remove_lib_rogot - Changing /lib64/libc.so.6 R GOT to RW GOT from 2b60c2b40000 to 2b60c2b44000
[[email protected]:30] remove_lib_rogot - Checking whether /lib64/ld-linux-x86-64.so.2 has R GOT
[[email protected]:41] remove_lib_rogot - Changing /lib64/ld-linux-x86-64.so.2 R GOT to RW GOT from 2b60c2566000 to 2b60c2567000
[[email protected]:39] spindle_la_activity - la_activity(): cookie = 0x2b60c25685c0; flag = LA_ACT_CONSISTENT
[Server.252113@ldcs_api_listen.c:174] ldcs_listen - Select returned data.  Calling callback for fd 14 id=0
[Server.252113@ldcs_audit_server_client_cb.c:61] _ldcs_client_CB - Receiving message from client 0 on fd 14
[Server.252113@ldcs_api_pipe.c:387] _ldcs_read_pipe - before read from fifo 14, bytes_to_read = 8
[Server.252113@ldcs_api_pipe.c:398] _ldcs_read_pipe - read from fifo: 0 bytes ...
[Server.252113@ldcs_api_pipe.c:338] ldcs_recv_msg_static_pipe - Client disconnected.  Returning END message

Appendix

The result of the readelf -d for each application binary.

$ LC_ALL=C readelf -d hello_bind_now

Dynamic section at offset 0xdd8 contains 26 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x4003e0
 0x000000000000000d (FINI)               0x4005c4
 0x0000000000000019 (INIT_ARRAY)         0x600dc0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600dc8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              61 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x600fc8
 0x0000000000000002 (PLTRELSZ)           72 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x0000000000000018 (BIND_NOW)
 0x000000006ffffffb (FLAGS_1)            Flags: NOW
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400356
 0x0000000000000000 (NULL)               0x0
$

$ LC_ALL=C readelf -d hello

Dynamic section at offset 0xe28 contains 24 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x4003e0
 0x000000000000000d (FINI)               0x4005c4
 0x0000000000000019 (INIT_ARRAY)         0x600e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400298
 0x0000000000000005 (STRTAB)             0x400318
 0x0000000000000006 (SYMTAB)             0x4002b8
 0x000000000000000a (STRSZ)              61 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x601000
 0x0000000000000002 (PLTRELSZ)           72 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400398
 0x0000000000000007 (RELA)               0x400380
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400360
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400356
 0x0000000000000000 (NULL)               0x0
$

Spack package update needed to 0.9 release; missing -Wno_narrowing flag in 0.8.1 release

The currently available spack installation for Spindle pulls release 0.8.1 and fails due to narrowing error in compilation of spindle_logd.cc in the logging directory:

    133      CXX    spindle_logd.o
  >> 134    spindle_logd.cc:65:76: error: narrowing conversion of '255' from 'int' to 'char' inside { } [-Wnarrowing]
     135     static char exitcode[8] = { 0x01, 0xff, 0x03, 0xdf, 0x05, 0xbf, 0x07, '\n' };
     136                                                                                ^
  >> 137    spindle_logd.cc:65:76: error: narrowing conversion of '223' from 'int' to 'char' inside { } [-Wnarrowing]
  >> 138    spindle_logd.cc:65:76: error: narrowing conversion of '191' from 'int' to 'char' inside { } [-Wnarrowing]
     139      CCLD   libspindlelogc.la
     140    make[2]: *** [Makefile:386: spindle_logd.o] Error 1
     141    make[2]: Leaving directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-u6g66hhvbkxfa7n32x2gzferzpurspf3/spack-src/logging'
     142    make[1]: *** [Makefile:319: all-recursive] Error 1
     143    make[1]: Leaving directory '/tmp/asill/spack-stage/spack-stage-spindle-0.8.1-u6g66hhvbkxfa7n32x2gzferzpurspf3/spack-src'
     144    make: *** [Makefile:248: all] Error 2

I can work around this by using ./bin/spack install spindle cxxflags="-Wno-narrowing" but likely the spack package should be updated and this flag fixed for the older tarball for manual installations.

master doesn't build

./configure --enable-sec-none --with-hostbin=/scratch/pmpi/dsolt/WORKSPACE/spindle/myscript.sh

make

make[4]: Entering directory /scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/client/beboot' CC spindle_bootstrap-spindle_bootstrap.o CC spindle_bootstrap-parseloc.o CC spindle_bootstrap-spindle_mkdir.o make[4]: *** No rule to make target ../auditclient/exec_util.c', needed by spindle_bootstrap-exec_util.o'. Stop. make[4]: Leaving directory /scratch/pmpi/dsolt/WORKSPACE/spindle/Spindle/src/client/beboot'

Spindle not searching PATH when running with "--reloc-aout=no" or "--debug=yes"

Both of the options in the above title change Spindle to invoke the application executable from global storage rather than local. The execv in the bootstrapper can fail when pointed at a relative path'd global executable. This doesn't happen for local executables since we construct the path and make sure it's absolute.

The execv in spindle_bootstrap should thus be changed to an execvp.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.