Git Product home page Git Product logo

io-watchdog's People

Contributors

grondo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

io-watchdog's Issues

How to use io-watchdog with Torque?

Is there a way to use io-watchdog with Torque other than Slurm.

For another, if there is only one process without writing for a long time period, whether io-watchdog will report this as a hang or not?

Thanks in advance:)

io-watchdog-interposer can't find libc.so

libc.so.* is no longer typically kept under /lib or /lib64 as assumed in the io-watchdog code.
Need a better scheme for finding the "real" libc write call symbols.

Since we've interposed the write calls in our LD_PRELOAD library, when libc can't be found the code cannot currently print an error, it instead silently exits with code 15.

SPANK does not recognize io-watchdog properly

Hi,
I would like to link io-watchdog to my slurm installation, so hanging jobs can 
be stopped.

Currently, I'm doing some tests with a LLNL resilience library called SCR. I've 
set up four nodes and installed all the necessary software in  order to run MPI 
jobs using SLURM. One of this nodes acts as the SLURM controller and the rest 
as compute nodes. This system is working properly at the moment.

OS: Debian 3.2.51-1 (Squeeze)
SLURM Version: 2.6.4
SLURM Spank Plugins version: 0.23
io-watchdog version: 0.8

To link io-watchdog with SLURM, I need to install and configure SPANK so it can 
load dynamically the library when the user calls srun, as io-watchdog 
documentation says. After a successfull installation, I added io-watchdog 
dynamic library path to /etc/ld.so.conf.d/ and included the following lines in 
plugstack.conf:
required /usr/local/lib/io-watchdog/io-watchdog.so
required /usr/local/lib/io-watchdog-interposer.so

(I checked that those paths are right)

However, the following info appears on the logs when starting slurm daemon:
# tail -n14 slurmd.log 
[2013-11-22T09:21:55.719] debug:  spank: opening plugin stack 
/usr/local/etc/plugstack.conf
[2013-11-22T09:21:55.723] debug3: Couldn't find sym 'slurm_spank_init' in the 
plugin
[2013-11-22T09:21:55.728] debug3: Couldn't find sym 'slurm_spank_slurmd_init' 
in the plugin
[2013-11-22T09:21:55.732] debug3: Couldn't find sym 'slurm_spank_job_prolog' in 
the plugin
[2013-11-22T09:21:55.736] debug3: Couldn't find sym 'slurm_spank_init_post_opt' 
in the plugin
[2013-11-22T09:21:55.740] debug3: Couldn't find sym 
'slurm_spank_local_user_init' in the plugin
[2013-11-22T09:21:55.744] debug3: Couldn't find sym 'slurm_spank_user_init' in 
the plugin
[2013-11-22T09:21:55.750] debug3: Couldn't find sym 
'slurm_spank_task_init_privileged' in the plugin
[2013-11-22T09:21:55.754] debug3: Couldn't find sym 
'slurm_spank_task_post_fork' in the plugin
[2013-11-22T09:21:55.760] debug3: Couldn't find sym 'slurm_spank_task_exit' in 
the plugin
[2013-11-22T09:21:55.764] debug3: Couldn't find sym 'slurm_spank_job_epilog' in 
the plugin
[2013-11-22T09:21:55.769] debug3: Couldn't find sym 'slurm_spank_slurmd_exit' 
in the plugin
[2013-11-22T09:21:55.773] debug3: Couldn't find sym 'slurm_spank_exit' in the 
plugin
[2013-11-22T09:21:55.778] debug2: spank: 
/usr/local/lib/io-watchdog/io-watchdog.so: no callbacks in this context

Note: Full log file is attached to this message.

Could version incompatibility be the reason why SPANK doesnt use io-watchdog 
library?

Thanks in advance,
 Jorge

Original issue reported on code.google.com by [email protected] on 22 Nov 2013 at 9:33

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.