Git Product home page Git Product logo

pman's Introduction

ChRIS logo pman

Version MIT License ci

pman, which once stood for process manager, is a Flask application providing an API for creating jobs with various schedulers e.g. Kubernetes, Podman, Docker Swarm, and SLURM. It basically translates its own JSON interface to requests for the various supported backends.

pman is tightly-coupled to pfcon. pman and pfcon are typically deployed as a pair, providing the pfcon service.

Running pman

The easiest way to see it in action is to run miniChRIS-docker. The instructions that follow are for pman hackers and developers.

Development

This section describes how to set up a local instance of pman for development.

Using Docker Compose

These instructions run pman inside a container using Docker and Docker Swarm for scheduling jobs. Hot-reloading of changes to the code is enabled.

docker swarm init --advertise-addr 127.0.0.1
docker compose up -d

Using Docker Swarm

To run a full test using docker stack deploy, run the test harness test_swarm.sh.

./test_swarm.sh

Using Podman for Development

pman must be able to schedule containers via Podman by communicating to the Podman socket.

systemctl --user start podman.service
export DOCKER_HOST="$(podman info --format '{{ .Host.RemoteSocket.Path }}')"

Install pman using Python

python -m venv venv
source venv/bin/activate
pip install -r requirements/local.txt
pip install -e .

Run pman using Python in Development Mode

python -m pman

Using Kubernetes via Kind

https://github.com/FNNDSC/pman/wiki/Development-Environment:-Kubernetes

Configuration

pman is configured by environment variables. Refer to the source code in pman/config.py for exactly how it works.

How Storage Works

pman relies on pfcon to manage data in a directory known as "storeBase." The "storeBase" is a storage space visible to every node in your cluster.

For single-machine deployments using Docker and Podman, the best solution is to use a local volume mounted by pfcon at /var/local/storeBase. pman should be configured with STORAGE_TYPE=docker_local_volume VOLUME_NAME=....

On Kubernetes, a single PersistentVolumeClaim should be used. It is mounted by pfcon at /var/local/storeBase. pman should be configured with STORAGE_TYPE=kubernetes_pvc VOLUME_NAME=....

SLURM has no concept of volumes, though SLURM clusters typically use a NFS share mounted to the same path on every node. pman should be configured with STORAGE_TYPE=host STOREBASE=..., specify the share mount point as STOREBASE.

swarm v.s. docker

Originally, pman interfaced with the Docker Swarm API for the sake of supporting multi-node clusters. However, more often than not, pman is run on a single-machine. Such is the case for developer environments, "host" compute resources for our single-machine production deployments of CUBE, and production deployments of CUBE on our Power9 supercomputers. Swarm mode is mostly an annoyance and its multi-node ability is poorly tested. Furthermore, multi-node functionality is better provided by CONTAINER_ENV=kubernetes.

In pman v4.1, CONTAINER_ENV=docker was introduced as a new feature and the default configuration. In this mode, pman uses the Docker Engine API instead of the Swarm API, which is much more convenient for single-machine use cases.

Podman Support

CONTAINER_ENV=docker is compatible with Podman.

Podman version 3 or 4 are known to work.

Rootless Podman

Configure the user to be able to set resource limits.

https://github.com/containers/podman/blob/main/troubleshooting.md#symptom-23

Environment Variables

Environment Variable Description
SECRET_KEY Flask secret key
CONTAINER_ENV one of: "swarm", "kubernetes", "cromwell", "docker"
STORAGE_TYPE one of: "host", "docker_local_volume", "kubernetes_pvc"
STOREBASE where job data is stored, valid when STORAGE_TYPE=host, conflicts with VOLUME_NAME
VOLUME_NAME name of data volume, valid when STORAGE_TYPE=docker_local_volume or STORAGE_TYPE=kubernetes_pvc
PFCON_SELECTOR label on the pfcon container, may be specified for pman to self-discover VOLUME_NAME (default: org.chrisproject.role=pfcon)
CONTAINER_USER Set job container user in the form UID:GID, may be a range for random values
ENABLE_HOME_WORKAROUND If set to "yes" then set job environment variable HOME=/tmp
SHM_SIZE Size of /dev/shm in mebibytes. (Supported only in Docker, Podman, and Kubernetes.)
JOB_LABELS CSV list of key=value pairs, labels to apply to container jobs
JOB_LOGS_TAIL (int) maximum size of job logs
IGNORE_LIMITS If set to "yes" then do not set resource limits on container jobs (for making things work without effort)
REMOVE_JOBS If set to "no" then pman will not delete jobs (for debugging)

STOREAGE_TYPE=host

When STORAGE_TYPE=host, then specify STOREBASE as a mount point path on the host(s).

STOREAGE_TYPE=docker_local_volume

For single-machine instances, use a Docker/Podman local volume as the "storeBase." The volume should exist prior to the start of pman. It can be identified one of two ways:

  • Manually, by passing the volume name to the variable VOLUME_NAME
  • Automatically: pman inspects a container with the label org.chrisproject.role=pfcon and selects the mountpoint of the bind to /var/local/storeBase

STORAGE_TYPE=kubernetes_pvc

When STORAGE_TYPE=kubernetes_pvc, then VOLUME_NAME must be the name of a PersistentVolumeClaim configured as ReadWriteMany.

In cases where the volume is only writable to a specific UNIX user, such as a NFS-backed volume, CONTAINER_USER can be used as a workaround.

Kubernetes-Specific Options

Applicable when CONTAINER_ENV=kubernetes

Environment Variable Description
JOB_NAMESPACE Kubernetes namespace for created jobs
NODE_SELECTOR Pod nodeSelector

SLURM-Specific Options

Applicable when CONTAINER_ENV=cromwell

Environment Variable Description
CROMWELL_URL Cromwell URL
TIMELIMIT_MINUTES SLURM job time limit

For how it works, see https://github.com/FNNDSC/pman/wiki/Cromwell

Container User Security

Setting an arbitrary container user, e.g. with CONTAINER_USER=123456:123456, increases security but will cause (unsafely written) ChRIS plugins to fail. In some cases, ENABLE_HOME_WORKAROUND=yes can get the plugin to work without having to change its code.

It is possible to use a random container user with CONTAINER_USER=1000000000-2147483647:1000000000-2147483647 however considering that pfcon's UID never changes, this will cause everything to break.

Missing Features

pman's configuration has gotten messy over the years because it attempts to provide an interface across vastly different systems. Some mixing-and-matching of options are unsupported:

  • IGNORE_LIMITS=yes only works with CONTAINER_ENV=docker (or podman).
  • JOB_LABELS=... only works with CONTAINER_ENV=docker (or podman) and CONTAINER_ENV=kubernetes.
  • CONTAINER_USER does not work with CONTAINER_ENV=cromwell
  • CONTAINER_ENV=cromwell does not forward environment variables.
  • STORAGE_TYPE=host is not supported for Kubernetes

TODO

  • Dev environment and testing for Kubernetes and SLURM.

pman's People

Contributors

arnavn101 avatar awalkaradi95 avatar betaredex avatar cagriyoruk avatar danmcp avatar husky-parul avatar iamemilio avatar jbernal0019 avatar jennydaman avatar kibablu avatar nicolasrannou avatar ravisantoshgudimetla avatar rudolphpienaar avatar sandip117 avatar umohnani8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pman's Issues

Support for podman and docker (without swarm)

Currently, pman only works with either Docker Swarm or Kubernetes. Support could be added for both podman and docker (without swarm) could be added by adding support for pman to speak to the docker engine API.

Note: It is better to target the docker engine API instead of podman's API. Podman's API is compatible with the docker engine API, so targeting docker here would be preferable over a tight coupling with podman.

The work involved is to:

  • add an implementation of abstractmgr.py for docker/podman, see swarmmgr.py as an example
  • add relevant configurations to config.py
  • add the new manager to get_compute_mgr defined in resources.py

What are plugin resource limits?

A plugin's JSON representation may define a memory_limit value. When creating a plugin instance of a particular plugin, this memory_limit value may be explicitly specified. But how exactly is memory_limit to be interpreted (by pman)?

The kubernetesmgr.py of pman is the first implementation of AbstractManager to respect the memory_limit value. So the amount of memory requested is hard-coded to be 100Mi, while a limit on memory is set to the value of memory_limit.

pman/pman/kubernetesmgr.py

Lines 116 to 117 in 898b8a1

requests = {'memory': '150Mi', 'cpu': '250m'}
limits = {'memory': memory_limit, 'cpu': cpu_limit}

So a plugin instance which sets memory_limit as 100Mi cannot run.

[ERROR][manager 8 140461197596480] [CODE01,chris-jid-9]: Error submitting job to pfcon url -->http://tunnels.local:6004/api/v1/<--, detail: {"message": "Error response from pman service while submitting job chris-jid-9, detail: {\"message\": \"(422)\\nReason: Unprocessable Entity\\nHTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '68c1055e-085f-4a7f-9938-4254064d5ce6', 'X-Kubernetes-Pf-Prioritylevel-Uid': '296737dd-595c-4f96-93b6-bb9c25ac5bcb', 'Date': 'Tue, 29 Mar 2022 21:28:15 GMT', 'Content-Length': '518'})\\nHTTP response body: {\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"Job.batch \\\\\\\"chris-jid-9\\\\\\\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \\\\\\\"150Mi\\\\\\\": must be less than or equal to memory limit\\\",\\\"reason\\\":\\\"Invalid\\\",\\\"details\\\":{\\\"name\\\":\\\"chris-jid-9\\\",\\\"group\\\":\\\"batch\\\",\\\"kind\\\":\\\"Job\\\",\\\"causes\\\":[{\\\"reason\\\":\\\"FieldValueInvalid\\\",\\\"message\\\":\\\"Invalid value: \\\\\\\"150Mi\\\\\\\": must be less than or equal to memory limit\\\",\\\"field\\\":\\\"spec.template.spec.containers[0].resources.requests\\\"}]},\\\"code\\\":422}\\n\\n\"}\n"}

Solution: set memory_limit as requested memory.

POST body is split on new lines, only last line is evaluated

Working example, because pfurl (mostly) minifies the input

pfurl --verb POST --raw --http localhost:5040/api/v1/cmd --jsonwrapper 'payload' --msg \
'{  "action": "run",
    "meta":  {
        "cmd":      "ls /share",
        "auid":     "rudolphpienaar",
        "jid":      "a-job-id-2",
        "threaded": true,
        "container":   {
                "target": {
                    "image":        "fnndsc/pl-simpledsapp"
                },
                "manager": {
                    "image":        "fnndsc/swarm",
                    "app":          "swarm.py",
                    "env":  {
                        "shareDir":     "/hostFS/storeBase",
                        "serviceType":  "docker",
                        "serviceName":  "testService"
                    }
                }
        }
    }
}'

Supposed to be an equivalent call using curl but does not work:

curl http://localhost:8012/api/v1/cmd --data \
    '{ "payload": {  "action": "run",
        "meta":  {
            "cmd":      "ls /share",
            "auid":     "rudolphpienaar",
            "jid":      "a-job-id-2",
            "threaded": true,
            "container":   {
                    "target": {
                        "image":        "fnndsc/pl-simpledsapp"
                    },
                    "manager": {
                        "image":        "fnndsc/swarm",
                        "app":          "swarm.py",
                        "env":  {
                            "shareDir":     "/hostFS/storeBase",
                            "serviceType":  "docker",
                            "serviceName":  "testService"
                        }
                    }
            }
        }
    }}'

Logs

2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | Listener ID - 10: process() - handling request
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | 
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ***********************************************
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ***********************************************
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | 2021-02-06 19:20:02.125230 incoming data stream
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ***********************************************
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | len = 1020
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ***********************************************
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | POST /api/v1/cmd HTTP/1.1
Host: localhost:5010
User-Agent: curl/7.75.0
Accept: */*
Content-Length: 861
Content-Type: application/x-www-form-urlencoded

{ "payload": {  "action": "run",
            "meta":  {
                "cmd":      "ls /share",
                "auid":     "rudolphpienaar",
                "jid":      "a-job-id-2",
                "threaded": true,
                "container":   {
                        "target": {
                            "image":        "fnndsc/pl-simpledsapp"
                        },
                        "manager": {
                            "image":        "fnndsc/swarm",
                            "app":          "swarm.py",
                            "env":  {
                                "shareDir":     "/hostFS/storeBase",
                                "serviceType":  "docker",
                                "serviceName":  "testService"
                            }
                        }
                }
            }
        }}

2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ***********************************************
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | Request = ...
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | ['POST /api/v1/cmd HTTP/1.1\r', 'Host: localhost:5010\r', 'User-Agent: curl/7.75.0\r', 'Accept: */*\r', 'Content-Length: 861\r', 'Content-Type: application/x-www-form-urlencoded\r', '\r', '{ "payload": {  "action": "run",', '            "meta":  {', '                "cmd":      "ls /share",', '                "auid":     "rudolphpienaar",', '                "jid":      "a-job-id-2",', '                "threaded": true,', '                "container":   {', '                        "target": {', '                            "image":        "fnndsc/pl-simpledsapp"', '                        },', '                        "manager": {', '                            "image":        "fnndsc/swarm",', '                            "app":          "swarm.py",', '                            "env":  {', '                                "shareDir":     "/hostFS/storeBase",', '                                "serviceType":  "docker",', '                                "serviceName":  "testService"', '                            }', '                        }', '                }', '            }', '        }}']
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | Using token authentication: False
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | Request authorized
2021-02-06 19:20:02  |    cbf53302ee63 |                    pman.py:Listener.process() | json_payload = '        }}'
Exception in thread Thread-14:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 671, in run
    resultFromProcessing    = self.process(request)
  File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 2191, in process
    d_payload           = json.loads(json_payload)
  File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 9 (char 8)

Working curl example

curl http://localhost:5010/api/v1/cmd --data '{"payload": {"action": "run", "meta": {"cmd": "ls/share", "auid": "rudolphpienaar", "jid": "not-a-uuid-2", "threaded": true, "container": {"target": {"image": "fnndsc/pl-simpledsapp"}, "manager": {"image": "fnndsc/swarm", "app": "swarm.py", "env": {"shareDir": "/wow/hostFS/a_folder", "serviceType": "docker", "serviceName": "testService"}}}}}}'

Escaped characters in CMD still wrong with Kubernetes

{
    "collection": {
        "version": "1.0",
        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/instances/176/parameters/",
        "items": [
            {
                "data": [
                    {
                        "name": "id",
                        "value": 86
                    },
                    {
                        "name": "param_name",
                        "value": "filter"
                    },
                    {
                        "name": "value",
                        "value": ".*_81920\\.obj$"
                    },
                    {
                        "name": "type",
                        "value": "string"
                    }
                ],
                "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/string-parameter/86/",
                "links": [
                    {
                        "rel": "plugin_inst",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/instances/176/"
                    },
                    {
                        "rel": "plugin_param",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/parameters/353/"
                    }
                ]
            },
            {
                "data": [
                    {
                        "name": "id",
                        "value": 87
                    },
                    {
                        "name": "param_name",
                        "value": "expression"
                    },
                    {
                        "name": "value",
                        "value": "^(\\d+)/(.*)/(.*)(\\._81920\\.obj)$"
                    },
                    {
                        "name": "type",
                        "value": "string"
                    }
                ],
                "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/string-parameter/87/",
                "links": [
                    {
                        "rel": "plugin_inst",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/instances/176/"
                    },
                    {
                        "rel": "plugin_param",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/parameters/354/"
                    }
                ]
            },
            {
                "data": [
                    {
                        "name": "id",
                        "value": 88
                    },
                    {
                        "name": "param_name",
                        "value": "replacement"
                    },
                    {
                        "name": "value",
                        "value": "$2/$3/plinst$1$4"
                    },
                    {
                        "name": "type",
                        "value": "string"
                    }
                ],
                "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/string-parameter/88/",
                "links": [
                    {
                        "rel": "plugin_inst",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/instances/176/"
                    },
                    {
                        "rel": "plugin_param",
                        "href": "http://cube-next.tch.harvard.edu/api/v1/plugins/parameters/355/"
                    }
                ]
            }
        ],
        "links": [],
        "total": 3
    }
}

What Kubernetes got:

/docker-entrypoint.sh
/usr/local/bin/bulkrename
--saveinputmeta
--saveoutputmeta
--filter
.*_81920.obj$
--expression
^(d+)/(.*)/(.*)(._81920.obj)$
--replace
$2/$3/plinst$1$4
/share/incoming
/share/outgoing

Error handling if invalid meta JSON file passed

Running the following command makes purl hang and display error in pman server:

Notice the not JSON-valid double quote after "cmd": "cmd”.

purl --verb POST --raw --http 172.17.0.2:5010/api/v1/cmd --jsonwrapper 'payload' --msg \
'{  "action": "run",
        "meta": {
                "cmd”:      "ls",
                "auid":     "rudolphpienaar",
                "jid":      "cal-job-1234",
                "threaded": true
        }
}' --quiet --jsonpprintindent 4
  1. terminal running purl hangs
  2. error message on pman
2017-03-16 11:55:54.042755 |      <Listener(Thread-5, started 139819640420096)> |                        process | ***********************************************
2017-03-16 11:55:54.043535 |      <Listener(Thread-5, started 139819640420096)> |                        process | Request = ...
2017-03-16 11:55:54.044897 |      <Listener(Thread-5, started 139819640420096)> |                        process | ['POST /api/v1/cmd HTTP/1.1\r', 'Host: 172.17.0.2:5010\r', 'User-Agent: PycURL/7.43.0 libcurl/7.47.0 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 librtmp/2.3\r', 'Accept: */*\r', 'Content-Length: 15\r', 'Content-Type: application/x-www-form-urlencoded\r', '\r', '{"payload": {}}']
2017-03-16 11:55:54.045879 |      <Listener(Thread-5, started 139819640420096)> |                        process | json_payload = {"payload": {}}
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/dist-packages/pman/pman.py", line 549, in run
    resultFromProcessing    = self.process(request)
  File "/usr/local/lib/python3.5/dist-packages/pman/pman.py", line 1225, in process
    payload_verb        = d_request['action']
KeyError: 'action'
2017-03-16 11:55:54.038912 |      <Listener(Thread-4, started 139819989149440)> |                            run | Received b'' from client_id: b'\x00k\x8bEg'

2017-03-16 11:55:54.048111 |      <Listener(Thread-4, started 139819989149440)> |                            run | Listener ID - 0: run() - Received comms from client.

Feature request: add endpoint for getting resource consumption

It would be cool if pman could, over an HTTP endpoint, report the compute resource's resource usage e.g. % CPU usage (or CPU request).

This information would be requested by CUBE, and then requested from CUBE by a client, e.g.

GET /api/v1/computeresources/usage/

[
    {
        "compute_resource_name": "host",
        "usage": 0.2
    },
    {
        "compute_resource_name": "galena",
        "usage": 0.6
    },
    {
        "compute_resource_name": "e2-5m",
        "usage": 0.3
    }
]

This information can be rendered using ChRIS_ui for

  • the sake of pretty graphs
  • monitoring: see if any compute resources are down
  • scheduling: a user might want to choose a compute resource which has low utilization.

Kubernetes memory limit

Logs from pfcon:

[2022-03-04 17:31:29,568] [INFO][services:47 10 140453802506048] Sending RUN job request to pman at -->http://pman:5010/api/v1/<-- for job crispy-126
[2022-03-04 17:31:29,568] [INFO][services:49 10 140453802506048] Payload sent: {
    "cmd_args": "--saveinputmeta --saveoutputmeta --quiet",
    "cmd_path_flags": "",
    "auid": "jennings",
    "number_of_workers": 1,
    "cpu_limit": 1000,
    "memory_limit": 100,
    "gpu_limit": 0,
    "image": "fnndsc/ep-interpolate-surface-with-sphere:0.1.0",
    "selfexec": "isws",
    "selfpath": "/opt/conda/bin",
    "execshell": "/opt/conda/bin/python3.1",
    "type": "ds",
    "jid": "crispy-126"
}
[2022-03-04 17:31:30,620] [ERROR][services:60 10 140453802506048] Error response from pman service while submitting job crispy-126, detail: {"message": "(422)\nReason: Unprocessable Entity\nHTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '68c1055e-085f-4a7f-9938-4254064d5ce6', 'X-Kubernetes-Pf-Prioritylevel-Uid': '296737dd-595c-4f96-93b6-bb9c25ac5bcb', 'Date': 'Fri, 04 Mar 2022 17:31:30 GMT', 'Content-Length': '516'})\nHTTP response body: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Job.batch \\\"crispy-126\\\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \\\"150Mi\\\": must be less than or equal to memory limit\",\"reason\":\"Invalid\",\"details\":{\"name\":\"crispy-126\",\"group\":\"batch\",\"kind\":\"Job\",\"causes\":[{\"reason\":\"FieldValueInvalid\",\"message\":\"Invalid value: \\\"150Mi\\\": must be less than or equal to memory limit\",\"field\":\"spec.template.spec.containers[0].resources.requests\"}]},\"code\":422}\n\n"}

Preemptible schedulers

On E2 (BCH internal SLURM), there is a partition available for preemptible jobs. pman should have support for interacting with preemptible schedulers, and retrying interrupted jobs.

The same logic could also be applied to Kubernetes, where Kubernetes would want to reschedule a pod under certain circumstances (node down, OOMKilled).

Repo updates

Single source of truth (remove entries in Dockerfile that are duplicated in setup.py)

/tmp/pman reused

/tmp/pman is used both for copying the pman (the program) into a docker container (see Dockerfile:33) as well as the default path for the database. I deleted rm -rf /tmp/pman from the Dockerfile for debugging purposes, and was very confused when pman tried to read /tmp/pman as a database when the pman files were still there. The path should probably be changed to prevent confusion.

Stuck if Cromwell job run fails

If Cromwell (is misconfigured and) can't parse the WDL, instead of producing an error, pman will say that job is stuck on notstarted.

If SLURM job dies (for instance, because it gets cancelled due to time limit) there is no feedback.

Log retrieval

Log retrieval is problematic for two reasons:

Single Stream instead of stdout/stderr

In CUBE, pfcon, and pman, there is no distinction between stdout and stderr. Either the two output streams are joined, or one is disregarded.

Buffering full logs instead of streaming

def get_job_logs(self, job: J) -> str:

Internally, the AbstractManager interface does not support streaming (or chunking/pagination) of large logs. Plugin instances which produce a lot of output will cause pman to hang and time out when attempting to retrieve the logs.

Also, the AbstractManager interface is changed: get_job_logs returns typing.AnyStr which more accuately describes the data type of a container's logs. The encoding is handled in pman/services.py instead, an improvement over a previous hotfix b41cefb.

openshiftmgr.py is not fixed because I think it should be rewritten from the ground-up.

Overhaul of command specification

Currently pman's API accepts an object which looks like this:

{
    "cmd_args": "string",
    "cmd_path_flags": "string",
    "auid": "string",
    "number_of_workers": 1,
    "cpu_limit": 1000,
    "memory_limit": 200,
    "gpu_limit": 0,
    "image": "fnndsc/pl-bulk-rename:0.1.1",
    "selfexec": "bulkrename",
    "selfpath": "string",
    "execshell": "string",
    "type": "ds",
    "jid": "crispy-176"
}

A much simpler and more correct data type which would emcompass cmd_args, selfpath, selfexec, and execshell would just be to have one value command which is a list of strings. This is how everything else does it (Python subprocess, Dockerfile, Kubernetes container spec, ...)

I propose CUBE, pfcon, and pman should be changed so that:

  1. CUBE assembles selfpath, selfexec, execshell, and any arguments as one list of strings called command
  2. CUBE sends command to pfcon
  3. pfcon sends command to pman
  4. the AbstractManager of pman sends command to the runtime API (which, in every known case such as docker and kubernetes, accepts a list of strings)

This proposal would solve #202

New PR

New PR for latest changes and rebase.

Pman is unable to handle concurrent Med2Img Jobs

I was testing pfcon's handling of concurrent jobs with FS and DS Plugins. I used a python script that executed pfurl requests for running Med2Img jobs on multiple DICOM images. However, pman was not able to process all the jobs since it returned an error; but, when I ran the jobs with a large time gap (30-60 sec) between each job, pman was able to successfully complete all the jobs.

These are the steps that I went through:

[First Terminal]

  1. git clone [email protected]:FNNDSC/pfcon.git
  2. cd pfcon
  3. ./unmake.sh ; sudo rm -fr FS; rm -fr FS;./make.sh

[Second Terminal (in the pfcon directory)]

  1. git clone https://github.com/FNNDSC/SAG-anon
  2. export DICOMDIR=$(pwd)/SAG-anon.
  3. docker pull fnndsc/pl-med2img
  4. ./swiftCtl.sh -A push -E dcm -D $DICOMDIR -P chris/uploads/DICOM/dataset1

After pushing the DICOM files to swift, I ran a python script that executed FS and DS Plugins on pfcon.

  1. git clone [email protected]:arnavnidumolu/ChRIS-E2E.git

  2. cd ChRIS-E2E/scale-testing/

  3. I setup my configuration options on config.cfg --> nano config.cfg
    (Edit CHRIS_PATH)

  4. Lastly, I executed the python script --> python test_pfcon.py

The python script uses these two bash scripts to run FS and DS Plugins:

  1. FS Plugin Script
  2. DS Plugin (Med2Img) Script

Analysis

The FS Plugin job ran successfully and returned a valid response. Additionally, the first two DS plugin jobs were successful but the next few DS Plugin jobs did not return a "finishedSuccessfully" response.

I used docker-compose -f docker-compose_dev.yml logs -f pman_service in the pfcon directory to view the pman container logs. Within the logs, it displayed this error message:

pman_service_1   | Traceback (most recent call last):
pman_service_1   |   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
pman_service_1   |     self.run()
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 567, in run
pman_service_1   |     self.within.DB_fileIO(cmd = 'save')
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 314, in DB_fileIO
pman_service_1   |     if self.str_fileio   == 'json':     saveToDiskAsJSON(tree_DB)
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 274, in saveToDiskAsJSON
pman_service_1   |     tree_DB.tree_save(
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pfmisc/C_snode.py", line 1326, in tree_save
pman_service_1   |     self.treeExplore(**kwargs)
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pfmisc/C_snode.py", line 1424, in treeExplore
pman_service_1   |     ret = f(str_startPath, **kwargs)
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pfmisc/C_snode.py", line 1140, in node_save
pman_service_1   |     str_pathDiskOrig    = os.getcwd()
pman_service_1   | FileNotFoundError: [Errno 2] No such file or directory

After a few seconds passed, Pman returned another error message:

pman_service_1   | Traceback (most recent call last):
pman_service_1   |   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
pman_service_1   |     self.run()
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 667, in run
pman_service_1   |     resultFromProcessing    = self.process(request)
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 2165, in process
pman_service_1   |     self.processPOST(   request = d_request,
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 2240, in processPOST
pman_service_1   |     d_done              = eval("self.t_%s_process(request = d_request)" % payload_verb)
pman_service_1   |   File "<string>", line 1, in <module>
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 1071, in t_status_process
pman_service_1   |     self.dp.qprint("------- In status process ------------")
pman_service_1   |   File "/usr/local/lib/python3.8/dist-packages/pfmisc/debug.py", line 131, in qprint
pman_service_1   |     stack = inspect.stack()
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 1514, in stack
pman_service_1   |     return getouterframes(sys._getframe(1), context)
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 1491, in getouterframes
pman_service_1   |     frameinfo = (frame,) + getframeinfo(frame, context)
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 1461, in getframeinfo
pman_service_1   |     filename = getsourcefile(frame) or getfile(frame)
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 708, in getsourcefile
pman_service_1   |     if getattr(getmodule(object, filename), '__loader__', None) is not None:
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 737, in getmodule
pman_service_1   |     file = getabsfile(object, _filename)
pman_service_1   |   File "/usr/lib/python3.8/inspect.py", line 721, in getabsfile
pman_service_1   |     return os.path.normcase(os.path.abspath(_filename))
pman_service_1   |   File "/usr/lib/python3.8/posixpath.py", line 379, in abspath
pman_service_1   |     cwd = os.getcwd()
pman_service_1   | FileNotFoundError: [Errno 2] No such file or directory

Even though Pman returned an error message, I confirmed that it actually ran the job. To view the converted jpg files from the DS Plugin, I utilized the FS directory that was created in the pfcon directory.

ls FS/remote/key-16/outgoing/ # 16 refers to the Job ID which returns:

sample16-slice001.jpg  sample16-slice040.jpg  sample16-slice079.jpg  sample16-slice118.jpg  sample16-slice157.jpg ...

Conclusion

Since the DICOM files were successfully converted and stored in the FS directory, the job should have returned a finishedSuccesfully response. However, pman displayed an error message after running the DS plugin and it did not return a successful response when pfurl status command was executed. Pman ran the first two jobs successfully and returned a valid response, but it was unable to handle the next few jobs. It was only able to run all the jobs successfully when there was a time gap between job execution, allowing it to finish current jobs before moving onto other jobs.

Kubernetes OOM job restarts

When a job exceeds its memory limit, Kuberetes restarts it, which is pointless since it's just going to run out of memory again. Moreover, the outputdir is persistent across the restart which means the restart is not starting with a clean state.

Annotation of containers created by pman

pman should annotate the containers it creates (on Kubernetes or Docker Swarm) so that the container engine can be queried for information about currently existing containers created by ChRIS. Such a feature would enable telemetry about container runtime information, e.g. "what is the peak memory usage of ChRIS plugins?"

How do we support SLURM time limit?

Time limit is a very influential aspect of a queued job's priority to a SLURM scheduler, so pman should be able to send time limits to Cromwell.

Simplest solution: let CROMWELL_TIMELIMIT be an environment variable for pman, and send it off in the WDL everytime. Multiple pman instances and multiple pfcon instances can be created to define multiple compute environments with different time limits.

Propose spec solution: add to the ChRIS plugin spec, plugin developers must provide an estimate of time needed, and timelimit should be part of resources_dict

watch.py watches all the pods in the namespace

As of now, watch.py watches all the pods in the namespace. We need to limit it to the job that was created. If not addressed, this will cause problems when we have multiple pods per job.

@umohnani8 - Can you address this? Basically you can use this line
https://github.com/FNNDSC/pman/pull/63/files#diff-24f9c3766ff284d17b1307c1000e8ff3R320

instead of https://github.com/FNNDSC/pman/blob/master/openshift/pman-swift-publisher/watch.py#L26 for the pod created as part of job. Also, you need to iterate over the list of pods I think.

cc @danmcp

Dead code

entire t_fileiosetup_process is unreachable (if it were, function would crash)

server = ThreadedHTTPServer((d_args['ip'], int(d_args['port'])), StoreHandler)

Classes ThreadedHTTPServer and StoreHandler are probably copy-pasted from pfioh, they are undefined here.

one-liner proof:

$ docker run --rm --entrypoint /usr/local/bin/python fnndsc/pman -c "$(printf 'from pman import *\npman\nStoreHandler')"  
Traceback (most recent call last):
  File "<string>", line 3, in <module>
NameError: name 'StoreHandler' is not defined

AbstractManager interface only allows for one mount.

def schedule_job(self, image, command, name, resources_dict, mountdir=None):

pman/pman/resources.py

Lines 54 to 55 in 54e2a43

self.str_app_container_inputdir = '/share/incoming'
self.str_app_container_outputdir = '/share/outgoing'

It would be better if multiple mounts were supported, and with mount options, such as mounting a volume read-only v.s. read-write. With DS plugins it is preferred to run them as such

docker run --rm -v /data/in:/incoming:ro -v /data/out:/outgoing:rw fnndsc/pl-whatever whatever /incoming /outgoing

byte_str undefined when Exception occurs

pman/pman/pman.py

Lines 1855 to 1867 in 92c5c45

try:
byte_str = client.containers.run('%s' % str_managerImage,
str_cmdManager,
volumes = {'/var/run/docker.sock': {'bind': '/var/run/docker.sock', 'mode': 'rw'}},
remove = True)
except Exception as e:
# An exception here most likely occurs due to a serviceName collision.
# Solution is to stop the service and retry.
str_e = '%s' % e
print(str_e)
d_meta['cmdMgr'] = '%s %s' % (str_managerImage, str_cmdManager)
d_meta['cmdMrg_byte_str'] = str(byte_str, 'utf-8')

OOM error propagation

If a Kubernetes pod fails because OOM, pman should convey an error message to distinguish this failure mode.

Invalid requests are not handled

curl http://localhost:5010 --data '{}'
Request = ...
Exception in thread Thread-15:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 667, in run
resultFromProcessing    = self.process(request)
File "/usr/local/lib/python3.8/dist-packages/pman/pman.py", line 2149, in process
d_request           = d_payload['payload']
KeyError: 'payload'

No http response, connection stays open forever (or until client/network timeout)

Multi-node parallelism with number_of_workers

number_of_workers can be a way to support embarrassingly parallel jobs on multi-node compute environments.

How can a process identify which replicate it is? It is necessary to know so the workfload can be divided, e.g. in plugin code:

if WORKER_NUMBER == 1:
    process('1.png')
elif WORKER_NUMBER == 2:
    process ('2.png')
....

The equivalent concept in SLURM is a job array.

https://slurm.schedmd.com/job_array.html

e.g.

sbatch --job-array=1-4 job.sh

Four instances of job.sh will be executed, possibly on different compute nodes, and each instance will have an environment variable set SLURM_ARRAY_JOB_ID as 1, 2, 3, or 4.

pman should do something similar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.