flux-framework / flux-core Goto Github PK

core services for the Flux resource management framework

License: GNU Lesser General Public License v3.0

Shell 13.23% Lua 0.81% C 72.08% Python 10.16% Makefile 1.29% Perl 0.06% M4 1.37% Terra 0.17% Dockerfile 0.26% SourcePawn 0.09% Roff 0.22% Rebol 0.02% R 0.01% C++ 0.25%

hpc resource-manager workflows radiuss

flux-core's People

Contributors

Stargazers

Watchers

flux-core's Issues

Add quiet mode for flux-start, cmbd

When running tests there should be an ability to suppress informational messages
from flux-start and any cmbd connected to stdout/err. That is, I'd like to suppress
these cmbd: messages when running test_under_flux

*** ./t0002-basic-in-session.t ***
cmbd: 0-0: starting shell
ok 1 - flux-up
ok 2 - flux-comms info
ok 3 - flux-module works
ok 4 - kvs module is loaded by default
# passed all 4 test(s)
1..4
cmbd: 0: shutdown in 2s: shell (pid 117630) exited with rc=0

I suggest adding

     -v,--verbose            be annoyingly informative
+  -q,--quiet                 be mysteriously taciturn

or similar.

wreckrun performance drops drastically after first use

Repeated wreckrun invocations quickly degrades performance, probably related to #98. It does seem to level off after the first 3-10 runs, but only once it reaches nearly 6 seconds to launch a trivial job on two nodes, 16 processes each.

time results for recrun, time ./flux wreckrun -N 2 -t 16 hostname, repeated 100 times in a single flux session:

real    0m0.589s
real    0m3.231s
real    0m5.674s
real    0m5.682s
real    0m5.677s
real    0m5.679s
real    0m5.671s
real    0m5.682s
real    0m5.666s
real    0m5.678s
real    0m5.716s
real    0m5.955s
real    0m5.801s
real    0m5.832s
real    0m5.830s
real    0m5.843s
real    0m5.951s
real    0m5.800s
real    0m5.830s
real    0m5.837s
real    0m5.844s
real    0m5.833s
real    0m5.907s
real    0m5.826s
real    0m5.838s
real    0m5.839s
real    0m5.947s
real    0m5.799s
real    0m5.844s
real    0m5.829s
real    0m5.841s
real    0m5.839s
real    0m5.898s
real    0m5.837s
real    0m5.837s
real    0m5.834s
real    0m5.946s
real    0m5.806s
real    0m5.835s
real    0m5.833s
real    0m5.842s
real    0m5.939s
real    0m5.791s
real    0m5.828s
real    0m5.846s
real    0m5.831s
real    0m5.901s
real    0m5.854s
real    0m5.825s
real    0m5.835s
real    0m5.935s
real    0m5.812s
real    0m5.832s
real    0m5.830s
real    0m5.845s
real    0m5.832s
real    0m5.895s
real    0m5.830s
real    0m5.854s
real    0m5.830s
real    0m5.879s
real    0m5.843s
real    0m5.848s
real    0m5.823s
real    0m5.947s
real    0m5.795s
real    0m5.843s
real    0m5.823s
real    0m5.848s
real    0m5.836s
real    0m5.896s
real    0m5.848s
real    0m5.846s
real    0m5.828s
real    0m5.890s
real    0m5.838s
real    0m5.830s
real    0m5.860s
real    0m5.832s
real    0m5.884s
real    0m5.832s
real    0m5.854s
real    0m5.832s
real    0m5.908s
real    0m5.808s
real    0m5.848s
real    0m5.838s
real    0m5.875s
real    0m5.835s
real    0m5.827s
real    0m5.857s
real    0m5.822s
real    0m5.904s
real    0m5.829s
real    0m5.843s
real    0m5.900s
real    0m5.818s
real    0m5.840s
real    0m5.852s
real    0m5.838s

flux-start: only a single local session is supported

For running tests in parallel it will be nice if flux-start allowed multiple local comms sessions
to start at the same time. Currently I get

$ ./flux start --size=2
cmbd: cmbd is already running in /var/tmp/grondo/flux-0-1, pid 84630
cmbd: cmbd is already running in /var/tmp/grondo/flux-0-0, pid 84629
flux-start: 0 (pid 84679) exited with rc=1
flux-start: 1 (pid 84680) exited with rc=1

I'm guessing this will probably be a trivial fix, but I want to note it here so we don't lose track of the feature request.

A2X products in doc/cmd directory do not appear under prefix directory when building with VPATH

a2x output files default to the source directory. I propose adding the following option to the A2X command line in doc/cmd/Makefile.am:

diff --git a/doc/cmd/Makefile.am b/doc/cmd/Makefile.am
index f6d7f84..b247584 100644
--- a/doc/cmd/Makefile.am
+++ b/doc/cmd/Makefile.am
@@ -28,6 +28,7 @@ stderr_devnull_0 = 2>/dev/null
        $(AM_V_GEN)$(A2X) --attribute mansource=$(META_NAME) \
            --attribute manversion=$(META_VERSION) \
            --attribute manmanual="Flux Manual" \
+           --destination-dir=. \
            --doctype manpage --format manpage $< $(STDERR_DEVNULL)

 EXTRA_DIST = $(ADOC_FILES) COPYRIGHT.adoc spell.en.pws

Assertion failure in zeromq

The actual error printed is:

Assertion failed: ok (mailbox.cpp:82)

After which the KAP benchmark test aborts during MPI_Init() and dumps core files.

Fortunately the cmb also dumps, hopefully more useful, core files. Just one core file per failure it seems, and they all fail with the same backtrace (excepting specific memory addresses etc.).

#0  0x00002aaaab9d8635 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00002aaaab9d9e15 in abort () at abort.c:92
#2  0x00002aaaab0f8239 in ?? () from /usr/lib64/libzmq.so.3
#3  0x00002aaaab0fb7a7 in ?? () from /usr/lib64/libzmq.so.3
#4  0x00002aaaab110c9b in ?? () from /usr/lib64/libzmq.so.3
#5  0x00002aaaab111078 in ?? () from /usr/lib64/libzmq.so.3
#6  0x00002aaaab1250fa in ?? () from /usr/lib64/libzmq.so.3
#7  0x00002aaaab3513c5 in zframe_send () from /usr/lib64/libczmq.so.1
#8  0x00002aaaab357e9d in zmsg_send () from /usr/lib64/libczmq.so.1
#9  0x0000000000408688 in cmb_pub_event (ctx=0x7fffffffcf40, event=0x7fffffffcea0) at ../../../flux-core/src/broker/cmbd.c:1420
#10 0x0000000000409e45 in hb_cb (zl=0x624410, timer_id=1, ctx=0x7fffffffcf40) at ../../../flux-core/src/broker/cmbd.c:1778
#11 0x00002aaaab355cad in zloop_start () from /usr/lib64/libczmq.so.1
#12 0x00000000004054b5 in main (argc=5, argv=0x7fffffffd218) at ../../../flux-core/src/broker/cmbd.c:477

I was running flux commit 3b67cb3 with Jim's additional patch to deal with the named socket issue getting kap to launch with srun, basically removing the per-rank component of the path for it.

The run configuration is a pre-allocated 32-node slurm job, so it could easily be launched as a batch, running this flux command ./flux start -M barrier -N 32 -s 32 <absolute path to run script>. The run script launching KAP is below.

#! /bin/sh

MY_TCC=512
MY_N=32
MY_P=1
MY_C=512
MY_V=64
MY_K=/g/g12/scogland/projects/flux/build/src/test/kap/kap
MY_D=/g/g12/scogland/projects/flux/data/powers-of-two/test-32/T.512:P.1:C.512:V.64:A.1
MY_A=1
RUNS=1
VARY_SEQUENCE=1

cd $MY_D
t=$(date)
echo "$t\n"

sleep 3

for i in $(seq 1 $RUNS) ; do
  if [ $VARY_SEQUENCE -ne 0 ] ; then
    SEQUENCE_NUM=$i
  else
    SEQUENCE_NUM=0
  fi
  COMMAND=" srun -N$MY_N -n$MY_TCC --distribution=cyclic $MY_K --instance-num=$SEQUENCE_NUM -l --nproducers=$MY_P --nconsumers=$MY_C --value-size=$MY_V --cons-acc-count=$MY_A"
  mkdir run-$i
  pushd run-$i
  $COMMAND
  popd
done

t=$(date)
echo "$t \n"

kvsdir_* API should provide atomicity in namespace traversal

Walking the KVS namespace requires multiple kvs_get_*() or kvsdir_get_* calls, and each call looks up a key name from the root, thus values may be retrieved from multiple "versions" of the cache.

The hash tree organization should allow us to look up values by hash and maintain atomicity within a recursive key walk. A simple rework of the kvsdir_t / kvsdir_get_*() implementation could allow this to work without exposing hashes directly to API users.

flux-kvs dir key on non-directory should exit 1

*** ./t0004-kvs.t ***
not ok 17 - kvs: try to retrieve key as directory should fail # TODO known breakage

test_expect_success 'kvs: put double' '
flux kvs put $KEY=3.14159
'
test_expect_success 'kvs: get double' '
test_kvs_key $KEY 3.141590
'
test_expect_failure 'kvs: try to retrieve key as directory should fail' '
test_must_fail flux kvs dir $KEY

flux-kvs.c::dump_kvs_dir() was sort of pushed in there from another utility and its error handling does not match the rest of the code. Need to fix.

information is lost when flux logs are bridged to syslog

The internal Flux logging uses syslog-compatible facility and level.

Flux logs are "persisted" on rank 0 to a configurable destination. One possible destination is syslog(3) but the facility and level are set to a fixed value. The problem lies in src/common/log.c where the facility and level are chosen via log_set_dest ("syslog:facility:level").

When bridging to syslog, the logged facility and level should be used.

publish logs to an ipc:// socket for multiple consumers

Rank 0 cmbd should publish all flux_log() data to a local ipc:// socket. This will enable multiple consumers to access live logging data and filter by subscription.

One use case is stdio for a remote shell service. A remote shell command could obtain a unique identifier for a remote command, subscribe to rsh.<id> or similar, then issue the command.

x_ac_install_dirs.m4 fails to expand some dirs

${prefix} is left unexpanded in some path defines, e.g. (config/config.h)

/* Expansion of the "mandir" installation directory. */

define X_MANDIR "${prefix}/share/man"

KVS performance regression since first KAP results

Rerunning KAP tests that were published in our paper this year turned up some performance regressions at 256 nodes when keys were bunched in one directory.

framework project build system

Now that flux-core has been split off of the private ngrm repo, we need to think about how best to:

build what remains of ngrm against flux-core
structure other framework projects' build systems
compose framework projects to derive a resource manager configuration, e.g. the "minimum viable SLURM replacement"

Framework projects built against flux core should:

not duplicate code that's in flux-core if possible
build only against flux-core's exported interfaces
build against installed and uninstalled (source tree) flux-core

Some immediate problems are

some utility code (e.g. in src/common/liblsd and src/common/libutil) is used by all of the ngrm repo but is not part of flux-core's exported interfaces
framework projects will need to link wtih the same json-c, lua, and zeromq packages as flux-core

Possible models that could be instructive (suggestions courtesy @grondo):

flux command should search builddir and srcdir for commands

Commands like flux-topo and flux-wreckrun will be found in srcdir not builddir.

Currently they are not found by flux when running from VPATH build.

implement graceful broker shutdown

The 2s grace period should be a fallback if the session does not shut down properly.
At the moment it doesn't even try to shut down properly.

There is code in place so that the broker can send an empty message to each module which is treated as an EOF. Upon receipt of the EOF, the module can exit its main(), and the generic module code can send an empty message in the other direction. When EOF's are received from all modules, it is safe to exit the broker zloop.

The problem is that a module may be waiting for another module to respond before it handles the EOF, so it very easy to create deadlock when all modules unload at once. This needs a bit of design effort.

Test failures after upgrading to zeromq-4.1.0-rc1 / czmq-3.0.0-rc1

Built the above zeromq/czmq on my Ubuntu 14.04 box, installed to /usr/local prefix and ran into the following issues:

src/test/tasyncsock failed to compile due to zmq_event_t deprecation(?) This test can probably just go away
src/bindings/lua test failures

FAIL: tests/t0000-json
======================

#     Failed test (./tests/t0000-json.t at line 87)
#          got: 0
#     expected: 1
#     Failed test (./tests/t0000-json.t at line 89)
#          got: 0
#     expected: 1
#     Failed test (./tests/t0000-json.t at line 94)
#          got: 0
#     expected: 1024000000
ok 1 - equals works on empty tables
PASS: tests/t0000-json.t 1 - equals works on empty tables
ok 2 - equals detects unequal tables
PASS: tests/t0000-json.t 2 - equals detects unequal tables
ok 3 - empty array: result is a table
PASS: tests/t0000-json.t 3 - empty array: result is a table
ok 4 - empty array: result is expected
PASS: tests/t0000-json.t 4 - empty array: result is expected
ok 5 - array: result is a table
PASS: tests/t0000-json.t 5 - array: result is a table
ok 6 - array: result is expected
PASS: tests/t0000-json.t 6 - array: result is expected
ok 7 - table: result is a table
PASS: tests/t0000-json.t 7 - table: result is a table
ok 8 - table: result is expected
PASS: tests/t0000-json.t 8 - table: result is expected
ok 9 - nested table: result is a table
PASS: tests/t0000-json.t 9 - nested table: result is a table
ok 10 - nested table: result is expected
PASS: tests/t0000-json.t 10 - nested table: result is expected
ok 11 - table with empty tables: result is a table
PASS: tests/t0000-json.t 11 - table with empty tables: result is a table
ok 12 - table with empty tables: result is expected
PASS: tests/t0000-json.t 12 - table with empty tables: result is expected
ok 13 - string returns unharmed
PASS: tests/t0000-json.t 13 - string returns unharmed
ok 14 - `true' value returns unharmed
PASS: tests/t0000-json.t 14 - `true' value returns unharmed
ok 15 - `false' value returns unharmed
PASS: tests/t0000-json.t 15 - `false' value returns unharmed
not ok 16 - number 1 returns unharmed
FAIL: tests/t0000-json.t 16 - number 1 returns unharmed
ok 17 - number 0 returns unharmed
PASS: tests/t0000-json.t 17 - number 0 returns unharmed
not ok 18 - float value returns unharmed
FAIL: tests/t0000-json.t 18 - float value returns unharmed
not ok 19 - float value returns unharmed (more precision) # TODO Need to investigate
XFAIL: tests/t0000-json.t 19 - float value returns unharmed (more precision) # TODO Need to investigate
#     Failed (TODO) test (./tests/t0000-json.t at line 92)
#          got: 0
#     expected: 1.01
not ok 20 - large value returns unharmed
FAIL: tests/t0000-json.t 20 - large value returns unharmed
ok 21 - nil works
PASS: tests/t0000-json.t 21 - nil works
1..21

ERROR: tests/t0001-zmsg
=======================

# Subtest: Test zmsg basics
../../../config/tap-driver.sh: line 639: 32454 Segmentation fault      (core dumped) "$@"
ERROR: tests/t0001-zmsg.t - missing test plan
ERROR: tests/t0001-zmsg.t - exited with status 139 (terminated by signal 11?)

unwatch kvs values and gracefully return from a flux_reactor based application

kvs_watch() is cumbersome to use with the flux reactor because there is no way to
deregister the watch function, and no way to gracefully interrupt the reactor itself
(kvs_watch callbacks have no return value to pass up to reactor core)

If an application is using the flux_reactor interface, and wants to watch even one kvs
value or directory, it cannot use the nice idiom of deregistering all reactor handlers
indicating there is nothing left to do, because the kvs handlers do not support this
functionality.

Also since kvs handlers don't support passing error values up to flux reactor, we
can't even force an exit from the reactor by returning -1. The only solution is to
interrupt the reactor with a signal somehow, or set up a message to ourselves
that forces exit from the reactor by returning -1. (And I'm not sure the second approach
is currently possible either)

One solution is to support kvs_unwatch(), another solution is support the use of
kvs_watch_once() style single-action trigger. The user would be required to reset
the watch before returning to the reactor. The second approach sounds more
easily implemented, but probably has a race condition.

etc/flux/curve in sysconf_DATA

make install is failing when trying to install etc/flux/curve directory:

 grondo@:~/git/flux-core.git/b$ make DESTDIR=/tmp/grondo install-sysconfDATA
test -z "/usr/local/etc" || /bin/mkdir -p "/tmp/grondo/usr/local/etc"
 /usr/bin/install -c -m 644 etc/flux/config ../etc/flux/curve '/tmp/grondo/usr/local/etc'
/usr/bin/install: cannot stat `../etc/flux/curve': No such file or directory
make: *** [install-sysconfDATA] Error 1

The immediate problem here is that automake doesn't like directories to be listed
in sysconf_DATA, due to the following rule:

        for p in $$list; do \
          if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
          echo "$$d$$p"; \
        done | $(am__base_list) | \

that is, if the target isn't a normal file, the src for the install will be assumed to be from
$(srcdir) instead of current working directory. In this case, etc/flux/curve is a directory. It probably doesn't make sense for this rule to operate on directories anyway,
because the $(INSTALL_DATA) command args are probably only appropriate for
files.

The fix for this problem, then, would be to list individual files in etc/flux/curve/* in
sysconf_DATA. However, I think probably the real fix is that we don't actually want
to install the curve keys during make install (we certainly wouldn't want to package
them), so I propose we move /etc/flux/curve to noinst_sysconf_DATA, which is actually
an easier fix (and I have this committed on a branch if this solution is acceptable)

lua: Better handling of missing `hostlist` lua module

If hostlist is missing, flux-wreckrun bombs out in the middle of a run.
Handle this similarly to lua-posix fix, or package a version of hostlist.so with
flux-core?

tmouthandler: should "oneshot" timer remove itself from reactor?

One might expect that a reactor loaded with one single shot timer should exit normally after
the timer is fired, for example, with the following lua code (not yet committed to a repo yet, sorry)

#!/usr/bin/lua
local f,err = require 'flux'.new()
if not f then error (err) end

local count = 0
local to, err = f:timer {
    timeout = 250,
    oneshot = true,
    handler = function (f, to)
            count = count + 1
        print (count)
    end
}
local r = f:reactor()
print ("exited from reactor with r="..r)

One would expect output like:

1
exited reactor with r=0

However, observed behavior is that the reactor hangs indefinitely after timer fires and requiresa ^C:

(flux-236991-0) grondo@hype356:~/git/flux-core.git/t/lua$ lua t.lua 
1
^Clua: t.lua:14: interrupted!
stack traceback:
        [C]: in function 'reactor'
        t.lua:14: in main chunk
        [C]: ?

Adding to:remove() to the timeout handler does rectify the issue, but it seems like it should be unecessary.

build: verify that zeromq/czmq built with security, epgm

from: Mark Grondona [email protected]
to: Jim Garlick [email protected]
date: Sun, Sep 28, 2014 at 11:12 AM

On my laptop running Ubuntu 14.04.1 with

libzmq3-dev:amd64 4.0.4+dfsg-2
libmunge-dev 0.5.11-1ubuntu1

And czmq built from git and installed in /usr/local.. I'm getting
zeroed keys for curve, which seems to be causing an assert
when starting cmbd. I assume I'm missing some dependency
or didn't build something right, but I'm not sure where to start
looking...

$ ./flux keygen
Saving /home/grondo/.flux/curve/client
Saving /home/grondo/.flux/curve/client_private
Saving /home/grondo/.flux/curve/server
Saving /home/grondo/.flux/curve/server_private

grondo@moron:~/git/flux-core/src/cmd$ cat /home/grondo/.flux/curve/server
#   ****  Generated on 2014-09-28 11:09:06 by CZMQ  ****
#   ZeroMQ CURVE Public Certificate
#   Exchange securely, or use a secure mechanism to verify the contents
#   of this file after exchange. Store public certificates in your home
#   directory, in the .curve subdirectory.

metadata
    time = "2014-09-28T11:09:06"
    role = "server"
curve
    public-key = "0000000000000000000000000000000000000000"

$ ./flux start --size=1
cmbd: zcertstore.c:173: zcertstore_insert: Assertion `rc == 0' failed.
flux-start: 0 (pid 12491) Aborted

from: Jim Garlick [email protected]
to: Mark Grondona [email protected]
date: Sun, Sep 28, 2014 at 12:27 PM

The security stuff came in zeromq4 so you'll need to build that too.
I think the steps are:
-build and install libsodium to /usr/local (me: 0.5.0)
-build and install libzmq (me: 4.0.4), I think you need --with-pgm
(but it is now included in the source tree)
-build and install czmq (me: 2.2.0)

Sounds like we might want to build in checks for functioning pgm and
curve to our configure.ac.
(Maybe open an issue on that?)

from: Mark Grondona [email protected]
to: Jim Garlick [email protected]
date: Sun, Sep 28, 2014 at 12:44 PM

The libzmq3 in Ubuntu is confusingly zeromq 4.0.4 (see version of package). Maybe not built with libsodium? Would runtime checks be possible so that keygen doesn't silently create zero keys?

wreckrun test failure

After updating to latest master (5316591) the first wreckrun test (but not tests 2-4) failed for me.
Next time through, all succeeded. Opening this in case there is an intermittent problem here.

Here is the output from the failing run:

cmbd: [1415035592.547314] job.info[0] Setting job 1 to reserved
cmbd: child (pid 35608) exited with rc=0
cmbd: [1415035592.559056] lwj.1.info[0] Found kvs 'rank' dir
cmbd: [1415035592.559656] lwj.1.info[0] initializing from CMB: rank=0
cmbd: [1415035592.560555] lwj.1.info[0] lwj.1: node0: basis=0

cmbd: [1415035592.560903] lwj.1.info[2] Found kvs 'rank' dir
cmbd: [1415035592.561632] lwj.1.info[2] initializing from CMB: rank=2
cmbd: [1415035592.562084] lwj.1.info[0] lwj 1: node0: nprocs=1, nnodes=4, cmdline=[ "hostname" ]
cmbd: [1415035592.562389] lwj.1.info[0] updating job state to starting
cmbd: [1415035592.562783] lwj.1.info[2] lwj.1: node2: basis=2

cmbd: [1415035592.563342] lwj.1.info[0] reading lua files from /home/garlick/proj/flux-core/src/modules/wreck/lua.d/*.lua

cmbd: [1415035592.564413] lwj.1.info[2] lwj 1: node2: nprocs=1, nnodes=4, cmdline=[ "hostname" ]
cmbd: [1415035592.564861] lwj.1.info[2] reading lua files from /home/garlick/proj/flux-core/src/modules/wreck/lua.d/*.lua

cmbd: [1415035592.565373] lwj.1.info[0] in parent: child pid[0] = 35622
cmbd: [1415035592.567963] lwj.1.info[2] in parent: child pid[0] = 35627
cmbd: [1415035592.568273] lwj.1.info[3] Found kvs 'rank' dir
cmbd: [1415035592.569207] lwj.1.info[3] initializing from CMB: rank=3
cmbd: [1415035592.569833] lwj.1.info[3] lwj.1: node3: basis=3

cmbd: [1415035592.570432] lwj.1.info[3] lwj 1: node3: nprocs=1, nnodes=4, cmdline=[ "hostname" ]
cmbd: [1415035592.570773] lwj.1.info[3] reading lua files from /home/garlick/proj/flux-core/src/modules/wreck/lua.d/*.lua

cmbd: [1415035592.574645] lwj.1.info[3] in parent: child pid[0] = 35631
not ok 1 - wreckrun: works
#
#               hostname=$(hostname) &&
#               run_timeout 2 flux wreckrun -n4 -N4 hostname  >output &&
#               cat >expected <<-EOF  &&
#               $hostname
#               $hostname
#               $hostname
#               $hostname
#               EOF
#               test_cmp expected output
#

Unique key string required for barrier safety

The API currently implies that each barrier in a given comms session requires a unique key string supplied by the user. For the sake of safety/usability we should probably look into making this a per-job requirement at least, or perhaps even have an API to generate the necessary unique keys.

api module can block

The API module handles clients connected to a UNIX domain socket. When this socket becomes readable, the API module executes zfd_recv_typemask(nonblock = true) but as noted in zfd.h:

 * N.B. The nonblock flag doesn't completely eliminate blocking.
 * Once a message has begun to be read, the recv may block in order
 * to read the complete thing.

When a zmq message is received that needs to be sent to a client over the UNIX domain socket, the socket's readiness is not even checked. zfd_send_typemask () is called which does not have a nonblock flag.

So if a client becomes non-responsive but is still connected to the socket, it can block the API module's single thread. This could very easily occur, for example, if a process receives a SIGSTOP, or engages in blocking behavior of its own.

logging needs filtering/reduction

We used to have a log module that flux_log () routed messages through. When we reworked the project repository for public release, this module was dropped and logging was supported natively in the cmbd broker. The new logging implementation unconditionally forwards all log entries to rank 0, where they are disposed of according to this option

-L,--logdest DEST            Log to DEST, can  be syslog, stderr, or file

We lost three useful features when this happened:

logging could be filtered at the source
duplicate entries could be squashed on their way towards rank 0
each rank had a circular buffer of (filtered) log entries that it could spew forth on receipt of a fault event. The idea was that log data could be captured but only be injected into the network when some problem that might require additional context for debugging was detected

We should consider how to extend the very simple logging in the broker with a module that can do reductions, debug logging, and filtering.

wreck looks for wrexecd in fluxlibexecdir instead of abs_builddir

As reported by @lipari, during some runs of flux start from a builddir, wrexec is looking
for wrexecd in the wrong place. This might be a result of the change in module loading,
because now both wrexec.so and wrexecinst.so have the same MOD_NAME, so I
imagine there is a chance flux-start from builddir might load wrexecinst.so in
some cases (haven't verified this).

In any event, the scheme of building two wrexec modules is lame, and really the module
should use the flux config interface to get information about the path to wrexecd.
@garlick, have any suggestions on how to do this? I was thinking a new wrexec.
section for config.. but maybe that is too specific?

wreckrun hangs indefinitely on 32 nodes

In testing wreckrun, hoping to use it in place of srun and avoid the stability issue I've been seeing, I ran into an issue where on enough nodes it hangs during the "starting" phase. In particular, I tried the following and got the same results twice, once in a batch mode test and once in an interactive mxterm.

scogland at cab1288 in ~/projects/flux/build/src/cmd 
$ ./flux start -N 32 -s 32
scogland at cab132 in ~/projects/flux/build/src/cmd 
$ ./flux wreckrun -N 32 -t 16 hostname
wreckrun: 0.001s: Sending LWJ request for 1 tasks (cmdline "hostname")
wreckrun: 0.006s: Registered jobid 1
wreckrun: Allocating 512 tasks across 32 available nodes..
wreckrun: tasks per node: node[0-31]: 16
wreckrun: 0.028s: Sending run event
wreckrun: 0.150s: State = reserved
wreckrun: 0.538s: State = starting
^C

... (indefinite non-responsive hang)

a kvs watch handler cannot modify the key it watches

If a synchronous RPC is made within a kvs watch callback, and another kvs.watch response is received before the expected response to the RPC, it is put in a queue that will be processed the next time the handle blocks on a response. It will not, however, cause the reactor to become ready when it is re-entered following the watch callback.

Generic dlopen function

I'd like to propose/request a generic dlopen function be added to the flux-core library that all third party comms modules could use. This function would automatically search the FLUX_MODULE_PATH or MODULE_PATH by default like module_load(), flux_modfind(), and modfind() currently do and return *dso if successful. It would look something like this:

    char *searchpath = getenv ("FLUX_MODULE_PATH");
    void *dso = NULL;

    if (!searchpath)
        searchpath = MODULE_PATH;
    cpy = xstrdup (searchpath);
    a1 = cpy;
    while ((dirpath = strtok_r (a1, ":", &saveptr))) {
        if (asprintf (&schedplugin, "%s/schedplugin1.so", dirpath) < 0)
            oom ();
        if (stat (schedplugin, &sb) == 0) {
            if (!(dso = dlopen (schedplugin, RTLD_NOW | RTLD_LOCAL))) {
                flux_log (h, LOG_ERR, "failed to open sched plugin: %s",
                          dlerror ());
                rc = -1;
                goto ret;
            } else {
                flux_log (h, LOG_DEBUG, "loaded: %s", schedplugin);
            }
            break;
        }
        a1 = NULL;
    }
    free (cpy);

    return dso;

only instead of "schedplugin" it would be a generic module name.

fd leak in cmbd due to wreckrun

I noticed a leak of approx 3 fds per invocation of flux wreckrun, probably due to some mishandling of zeromq sockets in wrexec plugin.

The whole way wrexec works needs some rethinking anyway.

 grondo@hype356:~/git/flux-core$ src/cmd/flux start -o,-v,-L/g/g0/grondo/git/flux-core/cmbd.log
(flux-192509-0) grondo@hype356:~/git/flux-core$ ls /proc/192510/fd | wc -w
80
(flux-192509-0) grondo@hype356:~/git/flux-core$ for i in `seq 0 100`; do src/cmd/flux wreckrun -n4 /bin/true >/dev/null 2>&1; done
(flux-192509-0) grondo@hype356:~/git/flux-core$ ls /proc/192510/fd | wc -w
383
(flux-192509-0) grondo@hype356:~/git/flux-core$ for i in `seq 0 100`; do src/cmd/flux wreckrun -n4 /bin/true >/dev/null 2>&1; done
(flux-192509-0) grondo@hype356:~/git/flux-core$ ls /proc/192510/fd | wc -w
686

Lua 5.2 support for luastack plugin stack library used in wrexecd

There is probably no reason lua bindings can't support both 5.1 and 5.2.
I just need to do the work to fixup 5.1/5.2 incompatibilities.

flux-module cannot unload a module it did not load

Modules that were loaded by modctl are tagged as managed, and only managed modules can be unloaded by modctl. flux-module should be able to unload any module.

flux session startup synchronization point is needed

When a fast-cycling session is launched, for example to run a test as a subcommand, say

flux start [--size N] /bin/true

The rank 0 command (/bin/true) executes immediately, potentially before any children have established connections to rank 0. When the command exits, the rank 0 broker sends out a shutdown event. If the shutdown event is sent before the children connect, it will not be received by the children. Rank 0 will exit after the 2s grace period and the children will remain, trying to connect.
They will then have to be killed by flux-start or by srun.

api: reactor should be reinitialized on subsquent calls to flux_reactor_start

At least in API context, the reactor implmentation does not reinitialize before calling zloop_start.
This can cause the reactor to exit immediately (or after the first event fires anyway) if a
previous loop was terminated with flux_reactor_stop().

To reproduce,

Set a timer and terminate reactor with flux_reactor_stop() on the first callback
remove first timer
Set a second timer with oneshot = false and call flux_reactor_start()

Second timer should run until Ctrl-C, but in current code exits after the first callback.

lua reproducer (once lua timer interface is committed this will actually run)

#!/usr/bin/lua

local f,err = require 'flux'.new()
if not f then error (err) end

local to, err = f:timer {
    timeout = 250,
    handler = function (f, to) to:remove() end
}
local r = f:reactor()
print ("exited from first reactor loop with r="..r)

local count = 0
local to, err = f:timer {
    timeout = 250,
    oneshot = false,
    handler = function (f, to)
        count = count + 1
        if count == 5 then to:remove() end
    end
}

local r = f:reactor()
print ("exited from second reactor with r="..r)
assert (count == 5)

alternate directory for flux keys

testsuite currently overwrites keys in ~/.flux when testing flux-keygen, and other tests that
run under a flux instance themselves will require keys anyway, so we'll have to ensure keys
exist and match the version of flux we're testing.

We'll need some way to specify an alternate directory for flux keys. Possibly propagated by
environment variable or something so all tests inherit the testsuite key directory. (I haven't thought
through the other repercussions of having alternate keydir, so I'm completely open on how to fix
this)

For now it might in fact be dangerous to run the testsuite if you have other flux instances
around anywhere.

flux command should be runnable in build tree from any directory

We should be able to use flux(1) cmd in build tree from any directory, e.g.

./src/cmd/flux start --size=2

flux run in this way should work the same as when run from src/cmd directly, i.e. it should still find all modules from builddir, point cmd dir to src/cmd, etc.

From top-level build directory. This will be necessary for running system testing.

flux-kvs get and put should interpret integer, double, boolean, string values

Right now flux-kvs get simply obtains JSON from the KVS, runs it through Jtostr (), and prints.
This means strings are still JSON-escaped e.g.

$ flux kvs get config.general.cmbd_path
"\/home\/garlick\/proj\/flux-core\/src\/broker\/cmbd"

We should check the type of object returned and perform the appropriate conversion.

Similarly, flux-kvs put calls Jfromstr () on the string from the command line. If that fails, it tries interpreting it as a string, but stops there.

flux-module list works around lack of synchronization with a usleep

modctl stores its state in the KVS, consisting of a reduced list of cmb.lsmod output, and an object for each module that modctl is managing.

$ flux kvs dir conf.modctl
conf.modctl.modules.xbarrier = {"args":{},"data":"0dNBhmGJO>000000000000ri+ ...
conf.modctl.seq = 2
conf.modctl.lsmod = {"seq":2,"mods":{"wrexec":{"name":"wrexec","size":64921 ...

The lsmod data in the KVS is only updated when a module is loaded/unloaded via modctl. However, it includes data like module idle time that needs to be grabbed in real time. One can request modctl to update this data using flux_modctl_update() but there is no way to know when the update has landed in the KVS hence this atrocity:

if (flux_modctl_update (h) < 0)
    err_exit ("flux_modctl_update");
/* FIXME: flux_modctl_update doesn't wait for KVS to be updated,
 * so there is a race here.  The following usleep should be removed
 * once this is addressed.
 */
usleep (1000*100);
...kvs_get (

Request flux module load rank option

I would like the ability to specify the rank(s) on which to load a module. Eg.,
flux module load --rank=0 sched

flux-screen constructs cmbd module options incorectly

With the changes to cmbd's arguments in pull req #39, flux-start was updated but flux-screen was not.

These can be dropped:

-M,--module NAME            load additional module
-O,--modopt NAME:var=val    set additional module option
-k,--k-ary N                set reduction tree fanout

And an option similar to flux-start's -o, --cmbd-opts option should be added.

configure should halt if asciidoc not installed

Making all in doc
make[1]: Entering directory /net/freenas/mnt/main/home/garlick/work/flux-core/doc' Making all in cmd make[2]: Entering directory/net/freenas/mnt/main/home/garlick/work/flux-core/doc/cmd'
a2x --attribute mansource=flux-core
--attribute manversion=0.1.0
--attribute manmanual="Flux Manual"
--doctype manpage --format manpage flux.adoc
/bin/bash: a2x: command not found
make[2]: *** [flux.1] Error 127
make[2]: Leaving directory /net/freenas/mnt/main/home/garlick/work/flux-core/doc/cmd' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/net/freenas/mnt/main/home/garlick/work/flux-core/doc'
make: *** [all-recursive] Error 1

configure-fu needed for lua-posix?

On ubuntu 14.04.1 LTS:

If lua-posix is not installed, flux-wreckrun says:

./flux wreckrun hostname
/usr/bin/lua: ...flux-core/src/cmd/../bindings/lua/flux-lua/timer.lua:30: module 'posix' not found:
no field package.preload['posix']
no file '/net/freenas/mnt/main/home/garlick/work/flux-core/src/cmd/../bindings/lua/posix.lua'
no file '/net/freenas/mnt/main/home/garlick/work/flux-core/src/cmd/../bindings/lua/posix.lua'
no file './posix.lua'
no file '/usr/local/share/lua/5.1/posix.lua'
no file '/usr/local/share/lua/5.1/posix/init.lua'
no file '/usr/local/lib/lua/5.1/posix.lua'
no file '/usr/local/lib/lua/5.1/posix/init.lua'
no file '/usr/share/lua/5.1/posix.lua'
no file '/usr/share/lua/5.1/posix/init.lua'
no file '/net/freenas/mnt/main/home/garlick/work/flux-core/src/cmd/../bindings/lua/.libs/posix.so'
no file '/net/freenas/mnt/main/home/garlick/work/flux-core/src/cmd/../bindings/lua/.libs/posix.so'
no file './posix.so'
no file '/usr/local/lib/lua/5.1/posix.so'
no file '/usr/lib/x86_64-linux-gnu/lua/5.1/posix.so'
no file '/usr/lib/lua/5.1/posix.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'require'
...flux-core/src/cmd/../bindings/lua/flux-lua/timer.lua:30: in main chunk
[C]: in function 'require'
./flux-wreckrun:7: in main chunk
[C]: ?

If I then sudo apt-get install lua-posix

./flux wreckrun hostname
/usr/bin/lua: ./flux-wreckrun:158: attempt to index field 'signal' (a function value)
stack traceback:
./flux-wreckrun:158: in main chunk
[C]: ?

The version of lua-posix apt-get found was 29-7ubuntu1

cmbd leaves files in /tmp

cmbd makes no effort to remove $FLUX_TMPDIR, its pid file, its api socket, or its request socket on exit. These things pile up after a while.

event network may not be wired yet when session is shut down

event and request networks currently wire up asynchronously.

We now wait to launch shell until live module says all nodes have sent hello messages.
This ensures the request network is wired up. However, the event network which carries
the shutdown message may still be connecting when shutdown is sent on a fast-cycling
job.

generic KVS dso loading service

@grondo mentioned in issue #87 that perhaps flux-core could offer a generic module-loading service using the KVS.

We are currently loading modules through the KVS in modctl and flux-module. That design needs a rethink, and perhaps it would be a good time to generalize such a service so it could be used by other services that have plugins like monitoring and scheduler.

build: x_ac_check_cond_lib.m4 warnings during aclocal

When building with automake 1:1.14.1-2ub on ubuntu, autogen.sh emits the following warnings, but not with earlier versions.

./autogen.sh
Running aclocal ...
configure.ac:58: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2661: _AC_LINK_IFELSE is expanded from...
../../lib/autoconf/general.m4:2678: AC_LINK_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:639: AS_IF is expanded from...
../../lib/autoconf/general.m4:2031: AC_CACHE_VAL is expanded from...
../../lib/autoconf/general.m4:2052: AC_CACHE_CHECK is expanded from...
config/x_ac_check_cond_lib.m4:23: X_AC_CHECK_COND_LIB is expanded from...
configure.ac:58: the top level
configure.ac:59: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
../../lib/autoconf/general.m4:2661: _AC_LINK_IFELSE is expanded from...
../../lib/autoconf/general.m4:2678: AC_LINK_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:639: AS_IF is expanded from...
../../lib/autoconf/general.m4:2031: AC_CACHE_VAL is expanded from...
../../lib/autoconf/general.m4:2052: AC_CACHE_CHECK is expanded from...
config/x_ac_check_cond_lib.m4:23: X_AC_CHECK_COND_LIB is expanded from...
configure.ac:59: the top level
...
(repeated)

This macro was borrowed from Chris Dunlap and he is aware of the issue.

flux kvs unlink entry should return an error if file does not exist

The current behavior seems wrong. For example it is currently possible to run flux kvs unlink . with no error, but this does not do anything.

The utility and the kvs_unlink() API call should return an error if the target does not exist.

lua bindings test (framework?) failure

I'm getting the following 'make check' failure on my ubuntu 14.04 system at home.

make[5]: Entering directory `/net/freenas/mnt/main/home/garlick/work/flux-core/src/bindings/lua'
lua: ...lick/work/flux-core/src/bindings/lua/tests/lunit.lua:626: /bin/bash:1: unexpected symbol near 'char(127)'
stack traceback:
[C]: in function 'error'
...lick/work/flux-core/src/bindings/lua/tests/lunit.lua:626: in function 'loadtestcase'
...lick/work/flux-core/src/bindings/lua/tests/lunit.lua:657: in function 'main'
stdin:11: in main chunk
[C]: ?
make[5]: *** [tests/test-json.lua.log] Error 1

API: overhaul KVS interfaces

The primary KVS interfaces should operate on valid JSON strings, not json-c specific json_object types, like the redesigned message functions.

flux-start: Eats cmdlines

It should be possible, and will be useful for the test suite, to preserve command line given to 'flux start' as much as possible. Currently, 'flux start' tries to concatenate args and thus drops quoting:

$ ./flux start -v --size=2 sh -c 'sleep 1'
flux-start: 0: ../broker/cmbd --size=2 --rank=0 --command=sh -c sleep 1
flux-start: 1: ../broker/cmbd --size=2 --rank=1
cmbd: 0-0: starting shell
sleep: missing operand
Try `sleep --help' for more information.
cmbd: 0: shutdown in 2s: shell (pid 7029) exited with rc=1
flux-start: 1 (pid 7022) exited with rc=1
flux-start: 0 (pid 7021) exited with rc=1

consider Lamport clocks to establish log ordering

The native logging facility in the cmbd is currently an open loop design, where log messages are sent upstream with no reply. If a reply message were added for each hop, we could maintain a Lamport clock on each broker that could then be included in messages. The clock value could then be used to establish a distributed partial ordering of log messages.

The heartbeat and keep-alive ping messages could potentially also contribute to the clock.

http://en.wikipedia.org/wiki/Lamport_timestamps

flux-framework / flux-core Goto Github PK

flux-core's People

Contributors

Stargazers

Watchers

Forkers

flux-core's Issues

define X_MANDIR "${prefix}/share/man"

Recommend Projects

Recommend Topics

Recommend Org