Comments (8)
@kelson42's propose:
- worker serves as controller, have access to host's docker socket
- worker launch other containers to do specific tasks when needed
- a task is a list of commands (bash scripts) executable on a specific docker image/container.
from zimfarm.
Here a comments on the cons about a generic approach:
-
security risk: the script / command could be modified during transmission: I do not know if this comes from the web API of from the queue, but each of them should have (1) an authentication system (2) a crypted channel. If that works, the message can not be changed.
- this, theoretically speaking is true. But you never know it could go wrong before it happens. If there is a more secure way, we should always consider the more secure way.
- also I would imagine you need sudo to run docker commands, which opens up more security loopholes (by no means I am a security expert, but this article specifically said in bold font "only trusted users should be allowed to control your Docker daemon")
-
security risk: docker socket of host is exposed to worker: yes, the docker users/zimfarm worker master need to agree on this, but this not unusual at all and a standard docker use case.
- I am sorry, I would not trust anyone to access my docker sockets and run docker commands, particularly if the command is sent through internet
-
task unit not broken down:
- if one zim file failed to be generated in the whole script, the whole script has to start over
- hard to tell the result of a specific zim file generation is success or not
- hard to tell the specific status of a specific zim file (i.e., you never know if a zim file is currently pending, generating or uploading)
- makes stdout and stderr hard to read, since they are a big blob containing stdout and stderr of all commands
- makes the whole point of parallel processing moot (if all we do is run several big scripts a few times per month)
from zimfarm.
Docker should not run as root. We can discuss long if this is standard or not to have the docker socket writable for a specific container, the only thing I can say is that Portainer which is a minimal UI on the top of Docker has a dedicated option for this so.... It can not be so special. All your usability points are not relevant to me (1 is not a problem, 2 I already answered - see exit code, 3 the log is enough, 4 problem not specific to that solution, 5 I do not want to execute big scripts).
If you disagree with that tech solution, you can choose an other approach but I definitely need an easy solution to execute X various scripts to make ZIM files. The requirements are:
- Run all the current existing scripts
- Be able to create new ones easily (<1 hour)
- New scripts should be executable immediately afterward (assuming nodes/workers are free)
Otherwise I will have to continue to run the scripts manually as it will be the only practicable solution to create new ZIM files.
from zimfarm.
BTW, part of the agreement problem we have might be solved by creating/starting Docker containers in a a Docker container. That would avoid the RW access to the TOP Docker daemon. So far I know this kind of things also work well.
from zimfarm.
- Docker needs to run as root (source)
Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.
- What is Portainer? a minimal UI on the top of Docker? Sorry I don't understand what you mean.
- Usability should concern you the most, since you will be the one that is mostly interacting with it. I am trying to create a system that you can create tasks once and forget, not manually uploading several scripts every month and manually parse through a blob of outputs.
How about this? I will implement both approaches, you are free to try any of them. To be specific:
- worker will build on
docker
official image. (BTW do you think there will be any problem if host docker version and worker docker version are not the same?) - worker will have two type of tasks:
- allow user to execute any script (here, a question: in the website, do you want to upload a script, or paste content of a script into text view, or both?)
- allow user to run mwoffliner container with parameters, and get the generated file and upload
from zimfarm.
After discussion we have decided that:
- scripts would be fully data-driven
- scripts would be executed on dedicated docker containers managed by the worker.
from zimfarm.
Another alternative to worker design:
a single docker compose file, containing
- services:
- worker
- redis
- mwoffliner
- other offliners
- volume:
- zim_output
Benefit:
- mwoffliner and other offliners do not need to keep restarting
- easier run commands for end user, since socket mapping and volume mapping is taken care of in compose file
Problem to solve:
- a way for mwoffliner and other offliners to communicate with worker
- we could use
docker exec
inside worker - would be better if they can communicate through network, so it eliminates mapping docker socket.
- we could use
Downside:
- cannot dynamically scale number of offliners
from zimfarm.
I think we can close that discussion now. I'm glad about the way it is done.
from zimfarm.
Related Issues (20)
- Add capability to recreate a ZIM with updated metadata
- Add concept of scraper release channels HOT 1
- Simplify warehouse list HOT 4
- Zimfarm worker: add support for multiple outgoing IPs HOT 4
- Fix sotoki scraper configuration
- Watcher is not detecting updated files when SE is fixing issues HOT 3
- Seems impossible to specify a tag HOT 5
- Upload logs and artifacts even when task is cancelled
- Name resolution errors in StackExchange watcher
- Configuration of Zimfarm local instance HOT 1
- Installing monitoring of a task is failing HOT 2
- Workers: report real disk size (and CPU / RAM) HOT 7
- StackExchanger watcher regularly restarts after connection reset by peer errors HOT 1
- Is the worker incorrectly checking the space already consumed by the Zimfarm? HOT 8
- Collect zimfarm usage statistics
- Some Zimfarm task are never finishing while the process is obviously finished HOT 5
- Adapt Zimfarm to MWoffliner 1.14 HOT 3
- Ensure we report MiB/GiB/... values everywhere
- Migrate to MWoffliner 1.14 HOT 6
- Recipe edit form consider there is always a pending modification
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zimfarm.