Circus is a program that runs and watches processes and sockets. Circus can be used as a library or through the command line.
- Full Documentation
- How to Contribute
- IRC: Freenode, channel #mozilla-circus
A Process & Socket Manager built with zmq
Home Page: http://circus.readthedocs.org/
License: Other
Circus is a program that runs and watches processes and sockets. Circus can be used as a library or through the command line.
http://travis-ci.org/#!/mozilla-services/circus/jobs/972344
Looks like it starts at -2 where it should be -1, not sure..
cc @jrgm
I want to be able to display general information on a circus system (in the circshttpd index page) such as:
It treats everything after add as the command, instead of breaking out the args like one does in the ini.
While I can happily browse the sphinx docs from github or pull & compile, pointing to http://packages.python.org/circus seems like a bad idea while that URL returns a 404 :)
20:54 < benbangert> yea, it'd be cool if it could work with the Procfile format,
http://michaelvanrooijen.com/articles/2011/06/08-managing-and-monitoring-your-ruby-application-with-foreman-and-upstart/
20:54 < benbangert> Ruby has a gem called Foreman, mainly intended for dev, but can export a config file for Bluepill (ruby version of circus)
20:55 < tarek> benbangert: ah ok -- I will read this
20:55 < benbangert> you list the programs your app needs running, and you can indicate how many workers to start for each 'task' you define in
the config file (Procfile)
we need to get ridd of wid/internal pid numbers in all public APIs, because that's confusing and we currently have no way to get a list of PIDs besides a "stats" call..
we need a simple way to test circus -- I don't want to get into any mocking
I am thinking about :
we optionally want start() to return only once everything has really started.
it should be a pubsub + controller plugin, that listen to events and act upon against the controller
it should be based on a base plugin
circus should be configurable to enable some plugins (via ini or via the APIs)
The README looks good, we need a minimal docs/ for the 0.1
Either that or instances of Watcher - I'm using Circus as a library, and have a use case for adding new watchers to an arbiter over time (I do this to avoid addressing new ZMQ endpoints, since pyzmq doesn't support '*' for TCP ports - the underlying C library does).
I have a process that is basically starting an empty arbiter, then on incoming requests, adds new watchers to that arbiter. I can't really do that the way I want to now because Arbiter.add_watcher() doesn't let me specify enough information about the watcher - like stdout_stream, etc.
Problem:
redirect stdin/stderr, possibly to a PUB/SUB feed.
context:
Popen.communicate
make it possile to interract with the process viapossible solutions:
Popen.communicate
: not really scalablePopen.communicate
via ZMQ.Any other idea?
I want to have a web ui atop circusctl..
Something we could launch with a "circushttpd" command.
I was thinking about using bottle for this. I also wonder if we should not do this feature in a separate circus-web project. But the bottle dependency is pretty light so...
Thoughts ?
Should be simple as:
language: python
python:
I can reliably reproduce a coredump when a set of watched processes are flapping, and numprocesses >= 3 (I think it's a multi-thread race thing, so technically > 1, but it happens fast >= 3).
stderr:
Assertion failed: ok (mailbox.cpp:79) Assertion failed: ok (mailbox.cpp:79) zsh: abort (core dumped) ../bin/circusd flap.ini
Example config to reproduce problem:
[circus] check_delay = 5 endpoint = tcp://127.0.0.1:5555 [watcher:test] cmd = /bin/bash args = -q # invalid option, makes bash exit non-zero warmup_delay = 0 numprocesses = 3
Python version 2.7.1
, zeromq package zeromq-2.1.9-1.fc15.x86_64
(from Fedora Core 15).
output:
2012-03-24 23:44:02 [11685] [INFO] Starting master on pid 11685 2012-03-24 23:44:02 [11685] [INFO] test started 2012-03-24 23:44:02 [11685] [INFO] Arbiter now waiting for commands 2012-03-24 23:44:07 [11685] [INFO] test: flapping detected: retry in 7s 2012-03-24 23:44:07 [11685] [INFO] test stopped 2012-03-24 23:44:07 [11685] [INFO] test: flapping detected: retry in 7s 2012-03-24 23:44:07 [11685] [INFO] test stopped 2012-03-24 23:44:07 [11685] [INFO] test: flapping detected: retry in 7s 2012-03-24 23:44:07 [11685] [INFO] test stopped 2012-03-24 23:44:14 [11685] [INFO] test started 2012-03-24 23:44:14 [11685] [INFO] test: flapping detected: retry in 7s Assertion failed: ok (mailbox.cpp:79) Assertion failed: ok (mailbox.cpp:79) zsh: abort (core dumped) ../bin/circusd flap.ini
It's zeromq related, judging by the beginning of the backtrace:
#0 0x00007f8c504d1215 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f8c504d2b2b in __GI_abort () at abort.c:93 #2 0x00007f8c4814727e in zmq::mailbox_t::recv (this=0x7f8c340015e0, cmd_= 0x7f8c43810cd0, timeout_=) at mailbox.cpp:77 #3 0x00007f8c48150149 in zmq::socket_base_t::process_commands (this= 0x7f8c34001500, timeout_=, throttle_=false) at socket_base.cpp:700 #4 0x00007f8c48150396 in zmq::socket_base_t::recv (this=0x7f8c34001500, msg_= 0x7f8c43810d90, flags_=0) at socket_base.cpp:605
full backtrace is available here.
I think this is a concurrency issue with zeromq and the flapping thread sending/receiving messages. I tried adding a lock around using the zeromq client in flapping, and that seems to be preventing the coredump: fetep/circus@80e6929
I was running the unit tests for circus on Centos. Negative UID/GID are illegal on linux, but are allowed on OS/X. So on OS/X '-12' throws a KeyError (if no such uid/gid), but on Linux it always throws OverflowError. So, circus/util.py:{to_uid,to_gid} just need a small tweak to catch both Error types. (Actually, -1200 might be a more unlikely value on OS/X).
$ git diff
diff --git a/circus/util.py b/circus/util.py
index 0080436..b7109e1 100644
--- a/circus/util.py
+++ b/circus/util.py
@@ -119,7 +119,7 @@ def to_uid(name):
try:
pwd.getpwuid(name)
return name
- except KeyError:
+ except (KeyError, OverflowError):
raise ValueError("%r isn't a valid user id" % name)
if not isinstance(name, str):
@@ -140,7 +140,7 @@ def to_gid(name):
try:
grp.getgrgid(name)
return name
- except KeyError:
+ except (KeyError, OverflowError):
raise ValueError("No such group: %r" % name)
if not isinstance(name, str):
currently the config file is read in the circusd module and used directly to generate watchers and an arbiter.
circus.get_arbiter() on the other hand let you programmaticaly create the same thing.
We should not have two different piece of code here to generate the classes.
The refactoring will do the following:
Please add 'from future import with_statement' to setup.py, otherwise you'll get a SyntaxError about 'with open()' statement in Python 2.5.
I'm looking into using circus as message queue interface to talk to interactive console session. Last bit that is missing is sending stdin through zeromq. Does that make sense to be part of circus?
Two tests fail for me:
test_dummy (circus.tests.test_runner.TestRunner) ... ok
test_handler (circus.tests.test_sighandler.TestSigHandler) ... ok
test_add_show (circus.tests.test_trainer.TestTrainer) ... ok
test_add_show1 (circus.tests.test_trainer.TestTrainer) ... ok
test_add_show2 (circus.tests.test_trainer.TestTrainer) ... ok
test_add_show3 (circus.tests.test_trainer.TestTrainer) ... ok
test_del_show (circus.tests.test_trainer.TestTrainer) ... ok
test_flies (circus.tests.test_trainer.TestTrainer) ... ok
test_numflies (circus.tests.test_trainer.TestTrainer) ... ok
test_numshows (circus.tests.test_trainer.TestTrainer) ... ok
test_reload (circus.tests.test_trainer.TestTrainer) ... ok
test_reload1 (circus.tests.test_trainer.TestTrainer) ... ok
test_shows (circus.tests.test_trainer.TestTrainer) ... ok
test_stop (circus.tests.test_trainer.TestTrainer) ... ok
test_stop_shows (circus.tests.test_trainer.TestTrainer) ... ERROR
Exception in thread Thread-15:
Traceback (most recent call last):
File "/Users/marca/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/Users/marca/dev/git-repos/circus/circus/tests/support.py", line 52, in run
self.trainer.start()
File "/Users/marca/dev/git-repos/circus/circus/trainer.py", line 46, in start
self.ctrl.poll()
File "/Users/marca/dev/git-repos/circus/circus/controller.py", line 132, in poll
(msg, resp, str(e)))
ValueError: Received 'stop_shows' - Could not send back 'ok' - Operation not supported
test_stop_shows1 (circus.tests.test_trainer.TestTrainer) ... ERROR
Exception in thread Thread-16:
Traceback (most recent call last):
File "/Users/marca/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/Users/marca/dev/git-repos/circus/circus/tests/support.py", line 52, in run
self.trainer.start()
File "/Users/marca/dev/git-repos/circus/circus/trainer.py", line 46, in start
self.ctrl.poll()
File "/Users/marca/dev/git-repos/circus/circus/controller.py", line 132, in poll
(msg, resp, str(e)))
ValueError: Received 'stop_shows' - Could not send back 'ok' - Operation not supported
test_get_info (circus.tests.test_util.TestUtil) ... ok
======================================================================
ERROR: test_stop_shows (circus.tests.test_trainer.TestTrainer)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/marca/dev/git-repos/circus/circus/tests/test_trainer.py", line 134, in test_stop_shows
resp = self.cli.call("stop_shows")
File "/Users/marca/dev/git-repos/circus/circus/client.py", line 30, in call
raise CallError(str(e))
CallError: Interrupted system call
-------------------- >> begin captured logging << --------------------
circus: INFO: Starting master on pid 20201
circus: INFO: running test fly [pid 20219]
circus: INFO: test: kill fly 20219
--------------------- >> end captured logging << ---------------------
======================================================================
ERROR: test_stop_shows1 (circus.tests.test_trainer.TestTrainer)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/marca/dev/git-repos/circus/circus/tests/test_trainer.py", line 138, in test_stop_shows1
self.cli.call("stop_shows")
File "/Users/marca/dev/git-repos/circus/circus/client.py", line 30, in call
raise CallError(str(e))
CallError: Interrupted system call
-------------------- >> begin captured logging << --------------------
circus: INFO: Starting master on pid 20201
circus: INFO: running test fly [pid 20220]
circus: INFO: test: kill fly 20220
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Ran 17 tests in 10.040s
FAILED (errors=2)
example:
{"status": 404, "errors": [{"description": "Unsupported application", "location": "url", "name": "application"}]}
The doc says "status": "error"
Hi, not sure where to put this feedback, so I made an issue! Close it at your leisure.
I really like where you're going with this. Process control sucks, ZMQ rocks, enhanced usability is awesome!
However, you hammer the "circus" metaphor too much, imo. As a new user, it's hard to understand what's going on with flies and shows and what not. I would suggest sticking to things like "processes", "tasks", and "jobs".
Thanks!
A flapping worker makes circus leaks FDs.
To reproduce, create a ini file with a process that flaps:
[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
[watcher:flappy]
cmd = bash
args = -q
Next, run circusd and watch the PIPE opened by the master PID:
$ lsof -a -p 55607 |grep PIPE | wc -l
32
That number grows everytime a process flaps. IOW with a few flapping processes, we reach the limits quite fast,
We need to detect the leak (probably in the process cleanup) and fix it
User need to be informed that the gevent stream backend works using the gevent_zeromq reposity from@tarekziade .
Other solution are :
Last solution is obviously the best but will take time.
I want to be able to define the ordering of the startup (a order= option)
after n seconds if a worker always die, abandon
Instead trying to launch bad command line during the fly initialization we should test it during the show creation:
PATH
The flapping options are in the watcher class, and have generic names:
we need to rename and document them
I wonder if we shouldn't put the watcher name and pid or watcher process id info in the file stream so we can have a better look on what happen ? Maybe creating a new stream class using the logging file handler?
We have this info so let's display it
Add ulimit options to run a process
we should be able to reuse that function in circusd in fact !
I'm guessing that you're trying to do something different but I don't quite understand what it is yet. :-)
$ circusctl --endpoint='tcp://127.0.0.1:6666' stop
Error: option --endpoint must not have an argument
Once merged, circus.stats provides a client class that can be used to get a streaming of stats for each running process.
Using gevent.socketio will let us use web sockets when available, to stream these stats in our little charts.
XHR-polling - as used now, is one fallback backend in socket.io that can be used to display our charts in browser that don't have web sockets.
allow logging configuration in the ini file.
$ bin/circusd examples/example1.ini
2012-04-26 23:18:24 [5037] [INFO] Starting master on pid 5037
2012-04-26 23:18:24 [5037] [INFO] dummy started
2012-04-26 23:18:24 [5037] [INFO] dummy2 started
2012-04-26 23:18:24 [5037] [INFO] Arbiter now waiting for commands
2012-04-26 23:18:25 [5037] [INFO] dummy: flapping detected: retry in 7s
2012-04-26 23:18:25 [5037] [INFO] dummy: flapping detected: retry in 7s
2012-04-26 23:18:25 [5037] [INFO] dummy stopped
2012-04-26 23:18:25 [5037] [INFO] dummy stopped
ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 365, in _run_callback
callback(*args, **kwargs)
File "/home/adnane/Projects/circus/circus/flapping.py", line 81, in handle_recv
self.check(topic_parts[1])
File "/home/adnane/Projects/circus/circus/flapping.py", line 112, in check
conf = self.update_conf(watcher_name)
File "/home/adnane/Projects/circus/circus/flapping.py", line 96, in update_conf
conf.update(msg.get('options'))
TypeError: 'NoneType' object is not iterable
ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 391, in _handle_events
self._handle_recv()
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 424, in _handle_recv
self._run_callback(callback, msg)
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 365, in _run_callback
callback(*args, **kwargs)
File "/home/adnane/Projects/circus/circus/flapping.py", line 81, in handle_recv
self.check(topic_parts[1])
File "/home/adnane/Projects/circus/circus/flapping.py", line 112, in check
conf = self.update_conf(watcher_name)
File "/home/adnane/Projects/circus/circus/flapping.py", line 96, in update_conf
conf.update(msg.get('options'))
TypeError: 'NoneType' object is not iterable
ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0x95f32fc>
Traceback (most recent call last):
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/ioloop.py", line 330, in start
self._handlers[fd](fd, events)
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 391, in _handle_events
self._handle_recv()
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 424, in _handle_recv
self._run_callback(callback, msg)
File "/home/adnane/Projects/circus/local/lib/python2.7/site-packages/pyzmq-2.1.11-py2.7-linux-i686.egg/zmq/eventloop/zmqstream.py", line 365, in _run_callback
callback(*args, **kwargs)
File "/home/adnane/Projects/circus/circus/flapping.py", line 81, in handle_recv
self.check(topic_parts[1])
File "/home/adnane/Projects/circus/circus/flapping.py", line 112, in check
conf = self.update_conf(watcher_name)
File "/home/adnane/Projects/circus/circus/flapping.py", line 96, in update_conf
conf.update(msg.get('options'))
TypeError: 'NoneType' object is not iterable
2012-04-26 23:18:32 [5037] [INFO] dummy started
INFO:circus:dummy started
kill worker 990
Traceback (most recent call last):
File "bin/circus", line 8, in
load_entry_point('circus==0.1', 'console_scripts', 'circus')()
File "/Users/tarek/Dev/github.com/circus/circus/init.py", line 20, in main
workers.run()
File "/Users/tarek/Dev/github.com/circus/circus/workers.py", line 55, in run
self.manage_workers()
File "/Users/tarek/Dev/github.com/circus/circus/workers.py", line 65, in manage_workers
self.spawn_workers()
File "/Users/tarek/Dev/github.com/circus/circus/workers.py", line 76, in spawn_workers
self.spawn_worker()
File "/Users/tarek/Dev/github.com/circus/circus/workers.py", line 80, in spawn_worker
worker = Worker(self._worker_counter, self.cmd)
File "/Users/tarek/Dev/github.com/circus/circus/workers.py", line 15, in init
self._worker = Popen(self.cmd.split())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 672, in init
python: can't open file 'dummy_worker.py': [Errno 2] No such file or directory
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1191, in _execute_child
data = _eintr_retry_call(os.read, errpipe_read, 1048576)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 471, in _eintr_retry_call
return func(*args)
File "/Users/tarek/Dev/github.com/circus/circus/sighandler.py", line 35, in signal
self.skt.send(self.CMD_MAP[signame])
KeyError: 'winch'
add an action that allows circusctl to display in the stdout some events
one argument: topic
it shoud be possible to register a new show and its configuration dynamically.
To add a command I propose the following syntax :
add_show name cmd <start>
where name
is the name of the command, cmd the full command (with
args), optionnal start can be either true or false (by default false)
and is defining if circus should start the command or not.
Other commands:
show_set optionname value
show_del optionname
show_mset option1 value1 option2 option
show_mdel options1 option2
Where optionname can be num_flies, warmup_delay, working_dir, shell,
uid, gid, send_hup. show_set
add one option while show_del
delete
one option. show_mset
and show_mdel
are used to set or delete
multiple options.
Thoughts?
Use case you run many circusd instances on several servers
Proposal: have a way to control then from a single command, that would let you call circusctl actions, but applied to one, some or all circusd.
I am not sure we want to add this in circusctl. Maybe a separate command that inherits from it
Thoughts ?
add an option to run a proc with a specific uid/gid
It would be nice to have a standard init.d script or have some easy support for this. The script should accept the standard commands: start / stop / status using the same script.
supervisord's split into starting the daemon via "supervisord" and controlling it via "supervisorctl" is annoying here - it seems circusd / circusctl has the same annoyance.
The split into two underlying modules is fine, it just makes for a nicer console API to have one common control script.
error: command '{"command": "restart", "properties": {}}': 'Arbiter' object has no attribute 'restart'
we need to secure execution done with another uid/gid if we can
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.