opencast / pyca Goto Github PK

Python Capture Agent for Opencast

License: GNU Lesser General Public License v3.0

Python 82.60% Shell 0.05% CSS 1.72% HTML 3.70% Makefile 0.50% JavaScript 5.93% Vue 4.16% Dockerfile 1.34%

python video capture-video opencast hacktoberfest

pyca's Introduction

PyCA – Opencast Capture Agent

PyCA is a fully functional Opencast capture agent written in Python. It is free software licensed under the terms of the GNU Lesser General Public License.

The goals of pyCA are to be…

flexible for any kind of capture device
simplistic in code and functionality
nonrestrictive in terms of choosing capture software

PyCA can be run on almost any kind of devices: A regular PC equipped with capture cards, a server to capture network streams, small boards or embedded devices like the Raspberry Pi.

Python Versions

PyCA requires Python ≥ 3.6. Older versions of Python will not work.

Documentation

For a detailed installation guide, take a look at the PyCA documentation.

Quick Install for Experienced Users

PyCA is configured to use FFmpeg by default. Make sure to have it installed or adjust the configuration to use something else.

git clone https://github.com/opencast/pyCA.git
cd pyCA
python3 -m venv venv
. ./venv/bin/activate
pip install -r requirements.txt
npm ci
vim etc/pyca.conf <-- Edit the configuration
./start.sh

pyca's People

Contributors

Stargazers

Watchers

pyca's Issues

Check if we need the capture agent state error

A new capture agent state “error” is introduced with Opencast 2.4:

https://bitbucket.org/opencast-community/matterhorn/pull-requests/1297

tag new release

I'm maintaining a Arch User Repository package for pyCA, so that the installation on Arch Linux hosts (which we use at WWU Münster) is standardized.

Right now the latest official release seems to be severly outdated. A newer official release would be my preferred option, so that I don't have to provide a pyca-git package which would commonly not be suitable for production use.

please tag a new official release.

Unicode error in start_capture

2016-11-20 23:25:02 ERROR    [ca.py:395:safe_start_capture()] Start capture failed
2016-11-20 23:25:02 ERROR    [ca.py:396:safe_start_capture()] Traceback (most recent call last):
  File "pyca/ca.py", line 393, in safe_start_capture
    return start_capture(event)
  File "pyca/ca.py", line 264, in start_capture
    f.write(value)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 506-510: ordinal not in range(128)

System Logger not usable

Currently pyCA is only able to log to STDERR. Although this is not that important if running with a systemd service file or probably even preferred in a docker container, there are some situations where you would rather use /dev/log for logging:

Distributions without Systemd (a lot embedded linux, some Gentoo (I think it is the default), Alpine Linux, ...)
Syslog forwarding to a remote host to gather all information in one spot or maybe for security reasons

Also you are able to use the great power of log levels and facilities. IMHO the best way would be to make this configurable.

Umlaute in Titel und Beschreibung führen zu pycurl.error: (0, '') - Opencast 2.0

Zum Reproduzieren einfach eine Aufzeichnung mit dem Titel "Ümlaut Täst" anlegen.

und pyCA mit python3 starten

Der Fehler:
2016-01-31 12:41:06 ERROR [ca.py:257:start_capture()] Something went wrong during the upload
2016-01-31 12:41:06 ERROR [ca.py:258:start_capture()] Traceback (most recent call last):
File "/home/benjamin/Coding/Repos/pyCA/pyca/ca.py", line 255, in start_capture
workflow_config)
File "/home/benjamin/Coding/Repos/pyCA/pyca/ca.py", line 336, in ingest
mediapackage = http_request(service + '/addDCCatalog', fields)
File "/home/benjamin/Coding/Repos/pyCA/pyca/ca.py", line 293, in http_request
curl.setopt(curl.HTTPPOST, post_data)
pycurl.error: (0, '')

Eine mögliche Lösung:
Muss aber noch testen ob und was dadurch kaputt geht.

    with open('%s/episode.xml' % recording_dir, 'r') as episodefile:
        dublincore = episodefile.read().encode('utf8', 'ignore')

und

    with open('%s/series.xml' % recording_dir, 'r') as seriesfile:
        dublincore = seriesfile.read().encode('utf8', 'ignore')

Configuration Logging Not Working

Logging in the configuration check() method does not work any longer since the logger is not yet initialized at that point.

Do we want a JSON API?

Looking at the new UI, we could easily leverage the internal state agent to provide detailed information about the capture agent. A neat way to do that would be to expose the states via a simple API that can also be used in monitoring. A simple /status.json would be a good start.

Thinking further, one might want to be able to manually start and stop recording remotely, because scheduling via Opencast is not comfortable if you don't know how long you actually want to record for yet.

My gut instinct would be to do it right the first time and build a small and simple JSON API (https://jsonapi.org). But this would again increase the scope, but also the functionality of pyCA.

So, do we want this?

Unable to locate package libcurl-gnutls-dev

Hi @lkiesow

I have tried to install the pyCA, in Raspberry pi but when ever i try the below command the package unable to locate. Could you please update your documentation with which repo you get the extra packages for pyCA build.

sudo apt-get install python-virtualenv python-dev libcurl-gnutls-dev libcurl-gnutls-dev virtualenv venv
Reading package lists... Done
Building dependency tree        
Reading state information... Done
E: Unable to locate package libcurl-gnutls-dev

I have tried adding the below source list too

deb http://http.debian.net/debian wheezy main contrib non-free
# deb-src http://ftp.de.debian.org/debian wheezy main contrib non-free

But still i can't get the package could you please provide me which repository you used to get the libcurl-gnutls-dev virtualenv venv . except these two packages everything installed without any issue.

Appreciated , Thanks

keep-alive for internal services

The current agent state implementation (#64) cannot clearly detect stuck or crashed services. We should implement a keep-alive function to fix this. I'm working on this right now. My approach is to add a timestamp column to the service state table that will be updated on set_service_status() call, including the main loops for the services. This timestamp can then be used to check if a service is still alive (i.e. not older than xx seconds).

Web Interface Mobile View

There are still some minor issues left with the mobile view of the pyCA interface. They need to be fixed.

Report available disk space in configuration

The Matterhorn Reference CA and Galicaster report the available disk space in the location where recordings are stored using the configuration property:

capture.cleaner.mindiskspace

(free disk space in bytes).

This is helpful because it allows disk space to be monitored centrally (one of several mechanisms that could be used for doing so).

pyCA should ideally do the same.

Heartbeat?

We've just had a instance of pyCA simply stop and didn't notice until it was scheduled to record. Our process monitoring did not alert us, nor did the process fail/quit, which would have led to systemd restarting the unit.

I'm wondering if it would be a good idea to implement some kind of heartbeat to continuously monitor the health of the instance. Of course this would need a proper implementation with the upcoming switch to multiple threads for different jobs, requiring some internal health checks for each of those threads. We could then provide the option to simply provide a HTTP api endpoint (.../status.json) or touching a local file, or periodically accessing a predefined URL (active monitoring).

On the other hand, we could just say: that's way too much overhead, monitoring should be done outside of the application, e.g. via watching the logfiles, etc.

I'm really not sure what the best way would be here. Any thoughts?

Running pyCA with UI using "&" will cause strange behaviour

When running pyCA with UI using command

./start.sh&; ./start.sh ui

it will not create any recordings.

The recording commands will be triggered but then the pyCA process will be suspended

[3] + 28097 suspended (tty output) ./start.sh

... and recordings will not be created and all further processes for that event will be canceled.

Also the UI will list the Recording processes as "recording" indefinitely. But strangely the top semaphone will show all Processed and Capture as idle.

Ignore Timezone Option Causes Error

Set ignore timezone to true causes time substraction to fail.

Use ETag and If-Not-Modified

To reduce load for the Scheduler…

Clarify targeted Python version

As you can see in #39, the recent changes are not entirely compatible with Python 3. I will spend a bit of time fixing this, but for the future: Are we targeting only Python 2, as this seems to still be the standard on most distros, or do we support Python 3 as well? Either way, we should clarify it as I couldn't see it while glancing over the README. Thoughts?

Re-register CA after connection errors to host

If the Matterhorn core has been changed while there was no connection from the pyCA to the core, the following requests will throw errors. This can be circumvented by re-setting the capture agent state after a connection loss.

register_ca needs to return success in backup mode

…the ca will not start up otherwise

capture network stream

Hi,
The readme states pyca could capture network stream.
How would one do that ?

Would be great if the config.py had an example as there are already several there.

Regards
Mikael Kermorgant

Catch all `ValueError` in getopt

Currently all ValueError in the getopt module are catched. See https://github.com/opencast/pyCA/blob/master/pyca/__main__.py#L84
Is there a reason for catching ValueError here at all?

Also I am wondering why this try block does not end at https://github.com/opencast/pyCA/blob/master/pyca/__main__.py#L85

Record after connection to Servers has been lost

We've observed that pyCA will not start a recording when the connection to the server has been lost, though it has already received the information about a scheduled recording.

It would be nice if pyCA would still perform the scheduled recording, even if the server is temporarily unreachable.

Retry ingesting a recording

If ingesting a recording has failed (e.g. network problems), the user has to ingest the recording by hand outside of pyCA.

It would be nice if pyCA provides a switch (e.g. pyCA --ingest-now recording-xyz) that can be used to retrigger ingesting a recording.

config is not validated at all

I created a broken config file

[agent]
name             = 'pyca'
update_frequency = 'string'  # this should be a integer

and started pyca. It still started normal without complaining about the broken config file and died later while trying to use an string as an int. Output:

2017-03-15 14:55:36 INFO     [utils.py:90:get_service()] Endpoint for org.opencastproject.scheduler: https://octestallinone.virtuos.uos.de/recordings
2017-03-15 14:55:36 INFO     [utils.py:90:get_service()] Endpoint for org.opencastproject.capture.admin: https://octestallinone.virtuos.uos.de/capture-admin
2017-03-15 14:55:36 INFO     [utils.py:90:get_service()] Endpoint for org.opencastproject.capture.admin: https://octestallinone.virtuos.uos.de/capture-admin
2017-03-15 14:55:36 INFO     [utils.py:90:get_service()] Endpoint for org.opencastproject.ingest: https://octestallinone.virtuos.uos.de/ingest
2017-03-15 14:55:36 INFO     [schedule.py:106:control_loop()] No scheduled recording
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/env/lib/python3.5/site-packages/pyca-1.0.0-py3.5.egg/pyca/schedule.py", line 120, in run
    control_loop()
  File "/tmp/env/lib/python3.5/site-packages/pyca-1.0.0-py3.5.egg/pyca/schedule.py", line 108, in control_loop
    next_update = timestamp() + config()['agent']['update_frequency']
TypeError: unsupported operand type(s) for +: 'int' and 'str'
2017-03-15 14:55:36 INFO     [utils.py:90:get_service()] Endpoint for org.opencastproject.capture.admin: https://octestallinone.virtuos.uos.de/capture-admin
 Process Process-4:
Traceback (most recent call last):
   File "/usr/lib64/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/env/lib/python3.5/site-packages/pyca-1.0.0-py3.5.egg/pyca/agentstate.py", line 36, in run
    control_loop()
  File "/tmp/env/lib/python3.5/site-packages/pyca-1.0.0-py3.5.egg/pyca/agentstate.py", line 25, in control_loop
    next_update = timestamp() + config()['agent']['update_frequency']
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Also I printed content from config.config()["agent"]["update_frequency"] and got string.

So it seems the validator does not work as expected.

Bakup mode configuration should be logged at start-up

Ask Admin Server about Ingest Nodes

In large deployments, Matterhorn may have dedicated ingest nodes. PyCA should ask the admin about this and randomly (?) select one of these nodes.

Active recording also appears as upcoming

Since an active recording still appears in upcoming due to it's present in the Opencast scheduling endpoint (it should be there since it would be possible that the agent is late in starting the recording), the event is duplicated in the up and the logging. This does not have any negative effect on the recording, but may be confusing for users.

Logs:

Press [q] to stop, [?] for help
2017-03-07 22:55:41 INFO     [utils.py:252:update_agent_state()] Reporting agentstate as capturing
2017-03-07 22:55:41 INFO     [schedule.py:107:control_loop()] Next scheduled recording: 2017-03-07 22:55:00
2017-03-07 22:56:41 INFO     [utils.py:252:update_agent_state()] Reporting agentstate as capturing
2017-03-07 22:56:41 INFO     [schedule.py:107:control_loop()] Next scheduled recording: 2017-03-07 22:55:00
2017-03-07 22:57:41 INFO     [utils.py:252:update_agent_state()] Reporting agentstate as capturing
2017-03-07 22:57:41 INFO     [schedule.py:107:control_loop()] Next scheduled recording: 2017-03-07 22:55:00

UI:

terminate capture processes after timeout

Optionally send sigterm at end of recording time + configurable time
Send sigkill at end of recording time + configurable time

WSGI application will not load local configuration

…since __msin__.py is never executed

Send status updates at fixed interval

pyca doesn't send regular status updates to the Opencast server, which means the last-updated time on the CA can get very old.

This can have unwanted side-effects, like monitoring scripts that check CA status could think that a pyCA is offline when in fact it's online.

pyCA should send regular status updates to the Opencast server, consistent with other CAs (e.g. reference CA, galicaster).

Update events after the recording started

At the moment upcoming and recorded events are completely separated due to possible problems when updating an event which already started. We should have a closer look at this to be able to support last minute schedule changes and prolonging of recordings which should be possible when using process controls introduced by #102.

Implement definable Inputs, pass on to recording command

Opencast gives the user the ability to select different inputs when scheduling a recording.

pyCA should be able to define such inputs with a set of keywords, report them to opencast and pass on the selection in a scheduled recording to the recording command.

I would be willing to work on such a feature.

Check if Attachments are handled properly

Do we need to ingest attachment files (e.g. security/xml)?

UI Documentation

It is poorly documented how to start and set-up the user interface.
Especially in combination with Gunicorn (+proxy for https).
We need to write something about that.

ingest failure - list() function in config file taken as string

I'm trying to get pyca working so this may not be a pyca issue but an error from my side.

Here's an extract from the config file :
[capture]
directory = './recordings'
command = 'timeout {{time}} gst-launch v4l2src device=/dev/video0 ! video/x-raw-rgb ! ffmpegcolorspace ! xvidenc ! avimux ! filesink location={{dir}}/{{name}}.avi | :'
flavors = list('presenter/source')
files = list('{{dir}}/{{name}}.avi')

And here's the log output from pyca when the recording occurs :

2015-06-01 10:45:01 INFO [ca.py:318:ingest()] Adding track (l)
2015-06-01 10:45:01 ERROR [ca.py:233:start_capture()] Something went wrong during the upload
2015-06-01 10:45:01 ERROR [ca.py:234:start_capture()] Traceback (most recent call last):
File "pyca/ca.py", line 231, in start_capture
workflow_config)
File "pyca/ca.py", line 322, in ingest
mediapackage = http_request('/ingest/addTrack', fields)
File "pyca/ca.py", line 274, in http_request
curl.perform()
error: (26, 'couldn't open file "l"\n')

Any idea ?

global configuration before local configuration

Probably this is a wanted behavior, but I think it is not canonical to ignore the local ./etc/config if a global /etc/config exists.

The canonical way would be that global and local overrides default and local overrides global.

Make Module Executable

Add main.py to run pyCA, tests and user interface.

Do not catch all exception silently

I know this issue is a little bit universal, but I found multiple places in the code were simply all exceptions are catched. This is a not very pythonic way to write code and make debugging very difficult.

I found this a bit entertaining blog entry about it https://realpython.com/blog/python/the-most-diabolical-python-antipattern/

Something like that should never happen:

try:
    BANANA
except:
    pass
print("Everything is fine")

Release pyCA to PyPi

Make pyCA installable from the Python Package Index.

Spaces instead of tabs?

Pylint warns, that indentations are made with tabs instead of spaces. It has a point, because the length of tabs may differ from editor to editor, system to system.

ffmpeg capture command error

Hi @lkiesow ,

I have been installed capture agent based upon your documentation and also compiled ffmpeg following the below url.

http://owenashurst.com/?p=242

Every thing is working good (registering capture agent in admin server) When i'm doing schedule for the (PyCA-test) capture agent,the agent can be pick up the recorded schedule from the admin server and we can see the scheduled time in the capture agent terminal and start capturing automatically at the particular time. Unfortunately, we are facing exception (see below) and the capture agent stopped the recording.

Exception:

PyCA-test set to idle
No scheduled recording
No scheduled recording
Next scheduled recording: 2014-12-11 10:38:00
Next scheduled recording: 2014-12-11 10:38:00
  configuration: --arch=armel --target-os=linux --enable-gpl --enable-libx264 --enable-nonfree --enable-libaacplus --enable-librtmp --enable-libmp3lame
  libavutil      54. 15.100 / 54. 15.100
  libavcodec     56. 14.100 / 56. 14.100
  libavformat    56. 15.103 / 56. 15.103
  libavdevice    56.  3.100 / 56.  3.100
  libavfilter     5.  2.103 /  5.  2.103
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Input #0, lavfi, from 'testsrc':
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 320x240 [SAR 1:1 DAR 4:3], 25 tbr, 25 tbn, 25 tbc
No pixel format specified, yuv444p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[swscaler @ 0x2894050] deprecated pixel format used, make sure you did set range correctly
Illegal instruction
Recording failed
16505 set to capture_error
PyCA-test set to idle

Is there any thing i need to change? I've to capture the video in H.264 format.

Start/Stop(/Pause?) UI

Based on signal handling for subprocess.

Set Capture Agent to Offline

The capture agent state offline is introduced with Opencast 2.4:

https://bitbucket.org/opencast-community/matterhorn/pull-requests/1127/mh-11708-cas-are-never-shown-as-offline/diff

Bug! Uploading recordings with UTF-8 Chars in the title fails.

This is quite serious.
We've just noticed that the upload/ingest for recordings that have UTF-8 chars in the title (Umlaute, etc.) fails reproducable. This is tested against Opencast 2.2.3.

Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.utils:181:recording_state()] b'8385795844570640384 set to uploading'
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.ingest:95:ingest()] Selecting ingest service to use: http://electuresda1.uni-muenster.de:8080/ingest
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.ingest:98:ingest()] Creating new mediapackage
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.ingest:103:ingest()] Adding episode DC catalog
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.ingest:69:start_ingest()] Something went wrong during the upload
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.ingest:70:start_ingest()] Traceback (most recent call last):
Apr 18 16:48:12 ele-ca-02 pyca[10980]:   File "/usr/lib/python3.6/site-packages/pyca/ingest.py", line 67, in start_ingest
Apr 18 16:48:12 ele-ca-02 pyca[10980]:     workflow_config)
Apr 18 16:48:12 ele-ca-02 pyca[10980]:   File "/usr/lib/python3.6/site-packages/pyca/ingest.py", line 110, in ingest
Apr 18 16:48:12 ele-ca-02 pyca[10980]:     mediapackage = http_request(service + '/addDCCatalog', fields)
Apr 18 16:48:12 ele-ca-02 pyca[10980]:   File "/usr/lib/python3.6/site-packages/pyca/utils.py", line 52, in http_request
Apr 18 16:48:12 ele-ca-02 pyca[10980]:     curl.setopt(curl.HTTPPOST, post_data)
Apr 18 16:48:12 ele-ca-02 pyca[10980]: pycurl.error: (0, '')
Apr 18 16:48:12 ele-ca-02 pyca[10980]: [pyca.utils:181:recording_state()] b'8385795844570640384 set to upload_error'

Recording cannot be ingested at all, tried forcing the status back to FINISHED_RECORDING to trigger a new ingest.

Logs on the Opencast server do not show anything suspicious:

...
Apr 18 18:48:12 xxx docker[10270]: 2017-04-18 18:48:12,937 | INFO  | (IngestServiceImpl:648) - Created mediapackage 4fd6cc75-8103-4cff-a21c-b5d7105fdf15
...

Ingesting recordings without any special chars works just fine.

json.loads Error - related to Python 3?

After finishing the recording command, the safe_start_capture() method fails with the following error:

2016-11-17 22:20:01 ERROR    [ca.py:395:safe_start_capture()] Start capture failed
2016-11-17 22:20:01 ERROR    [ca.py:396:safe_start_capture()] Traceback (most recent call last):
  File "/home/jan/git/electures/pyca/pyca/ca.py", line 393, in safe_start_capture
    return start_capture(event)
  File "/home/jan/git/electures/pyca/pyca/ca.py", line 256, in start_capture
    attachments = event.get_data().get('attach')
  File "/home/jan/git/electures/pyca/pyca/db.py", line 69, in get_data
    return json.loads(self.data)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

This was run on Arch Linux using Python 3.5.2. When run on the same system but with Python 2.7.12, it works just fine.

Separate Configuration File

Instead of the configuration module, use a dedicated configuration file to overwrite defaults.

urllib2 fallback

Since pyCURL is kind of hard to install, it would be nice to have a fallback to urllib2 which should be sufficient at least for the backup mode. There should be a warning in the logs though.

Process and agent state management

While looking at #53 and thinking about how this could be implemented the best way, I've noticed a few issues:

Completely separate processes

With #52 we gained the ability to launch the capture, schedule and ingest processes separately. I think that this is an important feature (ability to run as independent services on the system), but makes it hard to know which processes are actually running.

My proposal for this is to create .pid files for each service (even if we use run_all) which have to be checked before startup, if they exist, check if the process with the pid is still alive. If so exit, else start process. Obviously we should only ever have one process of each type, otherwise we will place ourselves in a special kind of hell.

Agent state management

This is a tricky one, as it is tied to the constraints of capture agent states in opencast. We are only ever able to define one state.

But what if we start recording, while a ingest process ist still working, the ingest process finishes and sets the state to idle although we have not finished recording yet? This is a possible scenario with tightly clocked events and slow uplink.

My proposal for this is to implement a internal state table for each process (working or idle), which will then be used to set the state according to a priority list:

offline
capturing
uploading
shutting_down (not used anywhere yet)
idle

Say every process but the capture process is idle, then we would set the agent state to capturing. Now our ingest process starts working. We do not change the agent state to uploading, because capturing supercedes it. As soon as the capture process is idle again, the ingest process is first working in priority list, so agent state is now uploading, etc.

With respect to my comment in #53 the behaviour for the scheduler process needs to be special: if the scheduler process exists, the internal state is idle. If there is no scheduler process the internal state is working (okay, that is a bad name. suggestions?), we are absolutely offline. If any other process would take precedence over this, it would give the illusion that the agent is ready to fetch new scheduled events.

Flask not in the `requirements.txt`

Flask is not listed in the requirements.txt, but obviously needed.

$ grep -R flask
pyca/ui/__init__.py:from flask import Flask, request, send_from_directory, Response
pyca/ui/__init__.py:from flask import render_template
setup.py:        "flask"
readme.rst:    python-flask python-sqlalchemy
readme.rst:    python-flask python-sqlalchemy
readme.rst:    python-flask python-sqlalchemy
.travis.yml:   - pip install flake8 python-coveralls coverage flask

Have safe_start_capture set recording to failed on error

At the moment the status is not set at all

Parallel video uploading\ingesting

Have you considered this option? I think it could be very useful, when an internet connection is not fast enough to upload a video before the next recording starts.