While looking at #53 and thinking about how this could be implemented the best way, I've noticed a few issues:
Completely separate processes
With #52 we gained the ability to launch the capture, schedule and ingest processes separately. I think that this is an important feature (ability to run as independent services on the system), but makes it hard to know which processes are actually running.
My proposal for this is to create .pid files for each service (even if we use run_all
) which have to be checked before startup, if they exist, check if the process with the pid is still alive. If so exit, else start process. Obviously we should only ever have one process of each type, otherwise we will place ourselves in a special kind of hell.
Agent state management
This is a tricky one, as it is tied to the constraints of capture agent states in opencast. We are only ever able to define one state.
But what if we start recording, while a ingest process ist still working, the ingest process finishes and sets the state to idle
although we have not finished recording yet? This is a possible scenario with tightly clocked events and slow uplink.
My proposal for this is to implement a internal state table for each process (working
or idle
), which will then be used to set the state according to a priority list:
offline
capturing
uploading
shutting_down
(not used anywhere yet)
idle
Say every process but the capture process is idle
, then we would set the agent state to capturing
. Now our ingest process starts working
. We do not change the agent state to uploading
, because capturing
supercedes it. As soon as the capture process is idle
again, the ingest process is first working
in priority list, so agent state is now uploading
, etc.
With respect to my comment in #53 the behaviour for the scheduler process needs to be special: if the scheduler process exists, the internal state is idle
. If there is no scheduler process the internal state is working
(okay, that is a bad name. suggestions?), we are absolutely offline
. If any other process would take precedence over this, it would give the illusion that the agent is ready to fetch new scheduled events.