Regarding how workers should be designed, there are two approaches, generic</s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Docker needs to run as root (<a href="https://docs.docker.com/engine/security/se

After discussion we have decided that: s would be fully

worker approach: generic or specialized about zimfarm HOT 8 CLOSED

openzim commented on August 12, 2024

worker approach: generic or specialized

from zimfarm.

Comments (8)

automactic commented on August 12, 2024

@kelson42's propose:

worker serves as controller, have access to host's docker socket
worker launch other containers to do specific tasks when needed
a task is a list of commands (bash scripts) executable on a specific docker image/container.

from zimfarm.

kelson42 commented on August 12, 2024

Here a comments on the cons about a generic approach:

security risk: the script / command could be modified during transmission: I do not know if this comes from the web API of from the queue, but each of them should have (1) an authentication system (2) a crypted channel. If that works, the message can not be changed.
- this, theoretically speaking is true. But you never know it could go wrong before it happens. If there is a more secure way, we should always consider the more secure way.
- also I would imagine you need sudo to run docker commands, which opens up more security loopholes (by no means I am a security expert, but this article specifically said in bold font "only trusted users should be allowed to control your Docker daemon")
security risk: docker socket of host is exposed to worker: yes, the docker users/zimfarm worker master need to agree on this, but this not unusual at all and a standard docker use case.
- I am sorry, I would not trust anyone to access my docker sockets and run docker commands, particularly if the command is sent through internet
task unit not broken down:
- if one zim file failed to be generated in the whole script, the whole script has to start over
- hard to tell the result of a specific zim file generation is success or not
- hard to tell the specific status of a specific zim file (i.e., you never know if a zim file is currently pending, generating or uploading)
- makes stdout and stderr hard to read, since they are a big blob containing stdout and stderr of all commands
- makes the whole point of parallel processing moot (if all we do is run several big scripts a few times per month)

from zimfarm.

kelson42 commented on August 12, 2024

Docker should not run as root. We can discuss long if this is standard or not to have the docker socket writable for a specific container, the only thing I can say is that Portainer which is a minimal UI on the top of Docker has a dedicated option for this so.... It can not be so special. All your usability points are not relevant to me (1 is not a problem, 2 I already answered - see exit code, 3 the log is enough, 4 problem not specific to that solution, 5 I do not want to execute big scripts).

If you disagree with that tech solution, you can choose an other approach but I definitely need an easy solution to execute X various scripts to make ZIM files. The requirements are:

Run all the current existing scripts
Be able to create new ones easily (<1 hour)
New scripts should be executable immediately afterward (assuming nodes/workers are free)

Otherwise I will have to continue to run the scripts manually as it will be the only practicable solution to create new ZIM files.

from zimfarm.

kelson42 commented on August 12, 2024

BTW, part of the agreement problem we have might be solved by creating/starting Docker containers in a a Docker container. That would avoid the RW access to the TOP Docker daemon. So far I know this kind of things also work well.

from zimfarm.

automactic commented on August 12, 2024

Docker needs to run as root (source)

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

What is Portainer? a minimal UI on the top of Docker? Sorry I don't understand what you mean.
Usability should concern you the most, since you will be the one that is mostly interacting with it. I am trying to create a system that you can create tasks once and forget, not manually uploading several scripts every month and manually parse through a blob of outputs.

How about this? I will implement both approaches, you are free to try any of them. To be specific:

worker will build on docker official image. (BTW do you think there will be any problem if host docker version and worker docker version are not the same?)
worker will have two type of tasks:
- allow user to execute any script (here, a question: in the website, do you want to upload a script, or paste content of a script into text view, or both?)
- allow user to run mwoffliner container with parameters, and get the generated file and upload

from zimfarm.

kelson42 commented on August 12, 2024

After discussion we have decided that:

scripts would be fully data-driven
scripts would be executed on dedicated docker containers managed by the worker.

from zimfarm.

automactic commented on August 12, 2024

Another alternative to worker design:

a single docker compose file, containing

services:
- worker
- redis
- mwoffliner
- other offliners
volume:
- zim_output

Benefit:

mwoffliner and other offliners do not need to keep restarting
easier run commands for end user, since socket mapping and volume mapping is taken care of in compose file

Problem to solve:

a way for mwoffliner and other offliners to communicate with worker
- we could use docker exec inside worker
- would be better if they can communicate through network, so it eliminates mapping docker socket.

Downside:

cannot dynamically scale number of offliners

from zimfarm.

kelson42 commented on August 12, 2024

I think we can close that discussion now. I'm glad about the way it is done.

from zimfarm.

worker approach: generic or specialized about zimfarm HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent