Git Product home page Git Product logo

Comments (8)

automactic avatar automactic commented on August 12, 2024

@kelson42's propose:

  • worker serves as controller, have access to host's docker socket
  • worker launch other containers to do specific tasks when needed
  • a task is a list of commands (bash scripts) executable on a specific docker image/container.

from zimfarm.

kelson42 avatar kelson42 commented on August 12, 2024

Here a comments on the cons about a generic approach:

  • security risk: the script / command could be modified during transmission: I do not know if this comes from the web API of from the queue, but each of them should have (1) an authentication system (2) a crypted channel. If that works, the message can not be changed.

    • this, theoretically speaking is true. But you never know it could go wrong before it happens. If there is a more secure way, we should always consider the more secure way.
    • also I would imagine you need sudo to run docker commands, which opens up more security loopholes (by no means I am a security expert, but this article specifically said in bold font "only trusted users should be allowed to control your Docker daemon")
  • security risk: docker socket of host is exposed to worker: yes, the docker users/zimfarm worker master need to agree on this, but this not unusual at all and a standard docker use case.

    • I am sorry, I would not trust anyone to access my docker sockets and run docker commands, particularly if the command is sent through internet
  • task unit not broken down:

    • if one zim file failed to be generated in the whole script, the whole script has to start over
    • hard to tell the result of a specific zim file generation is success or not
    • hard to tell the specific status of a specific zim file (i.e., you never know if a zim file is currently pending, generating or uploading)
    • makes stdout and stderr hard to read, since they are a big blob containing stdout and stderr of all commands
    • makes the whole point of parallel processing moot (if all we do is run several big scripts a few times per month)

from zimfarm.

kelson42 avatar kelson42 commented on August 12, 2024

Docker should not run as root. We can discuss long if this is standard or not to have the docker socket writable for a specific container, the only thing I can say is that Portainer which is a minimal UI on the top of Docker has a dedicated option for this so.... It can not be so special. All your usability points are not relevant to me (1 is not a problem, 2 I already answered - see exit code, 3 the log is enough, 4 problem not specific to that solution, 5 I do not want to execute big scripts).

If you disagree with that tech solution, you can choose an other approach but I definitely need an easy solution to execute X various scripts to make ZIM files. The requirements are:

  • Run all the current existing scripts
  • Be able to create new ones easily (<1 hour)
  • New scripts should be executable immediately afterward (assuming nodes/workers are free)

Otherwise I will have to continue to run the scripts manually as it will be the only practicable solution to create new ZIM files.

from zimfarm.

kelson42 avatar kelson42 commented on August 12, 2024

BTW, part of the agreement problem we have might be solved by creating/starting Docker containers in a a Docker container. That would avoid the RW access to the TOP Docker daemon. So far I know this kind of things also work well.

from zimfarm.

automactic avatar automactic commented on August 12, 2024
  • Docker needs to run as root (source)

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

  • What is Portainer? a minimal UI on the top of Docker? Sorry I don't understand what you mean.
  • Usability should concern you the most, since you will be the one that is mostly interacting with it. I am trying to create a system that you can create tasks once and forget, not manually uploading several scripts every month and manually parse through a blob of outputs.

How about this? I will implement both approaches, you are free to try any of them. To be specific:

  • worker will build on docker official image. (BTW do you think there will be any problem if host docker version and worker docker version are not the same?)
  • worker will have two type of tasks:
    • allow user to execute any script (here, a question: in the website, do you want to upload a script, or paste content of a script into text view, or both?)
    • allow user to run mwoffliner container with parameters, and get the generated file and upload

from zimfarm.

kelson42 avatar kelson42 commented on August 12, 2024

After discussion we have decided that:

  • scripts would be fully data-driven
  • scripts would be executed on dedicated docker containers managed by the worker.

from zimfarm.

automactic avatar automactic commented on August 12, 2024

Another alternative to worker design:

a single docker compose file, containing

  • services:
    • worker
    • redis
    • mwoffliner
    • other offliners
  • volume:
    • zim_output

Benefit:

  • mwoffliner and other offliners do not need to keep restarting
  • easier run commands for end user, since socket mapping and volume mapping is taken care of in compose file

Problem to solve:

  • a way for mwoffliner and other offliners to communicate with worker
    • we could use docker exec inside worker
    • would be better if they can communicate through network, so it eliminates mapping docker socket.

Downside:

  • cannot dynamically scale number of offliners

from zimfarm.

kelson42 avatar kelson42 commented on August 12, 2024

I think we can close that discussion now. I'm glad about the way it is done.

from zimfarm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.