nebula-orchestrator / worker Goto Github PK

The worker node manager container which manages nebula nodes

Home Page: https://nebula-orchestrator.github.io/

License: GNU General Public License v3.0

Python 96.60% Dockerfile 0.91% Shell 2.49%

docker nebula cluster orchestration orchestration-framework containers container container-cluster docker-cluster container-management python distributed iot-platform iot worker-manager

worker's Issues

Add automated unit tests

Expected/Wanted Behavior

There should be unit tests to ensure that noting fails

Actual Behavior

there is the framework for automatic unit tests in place but the worker tests are so far manually done.

Manual exit should stop running containers

Manually exiting the worker-manager should stop running containers, it's important to note that this should only happen if a user manually stops Nebula (ctrl-D) worker-manager and not if the container crash\exits for any reason (even docker stop/kill <container-name>) as on production use the worker-manager is expected to get back online, reconnect to rabbit, pull latest config from mongo & download the image used before rolling the containers in order to minimize impact on the applications containers.

would be great to allow wildcard APP_NAME

would be amazing to be able to have a wildcard APP_NAME
for instance APP_NAME=site_*, this way any new apps that are created with site_ will be installed on the workers.

add protection against missing fanout exchanges

if a vaild nebula app that is somehow missing the fanout exchange the exchange should be recreated ath the connecting to rabbit step after the database is revalideted that the app is a preexisting nebula app, this protected aginst boot crash loops in case the exchange is somehow deleted while the nebula app still has all of it's config valid in the mongo backend db

Migrate to Python 3.x

Expected/Wanted Behavior

With Python 2.x nearing it's EOL Nebula should migrate to Python 3.x (with the current minor version target being the latest released version)

Actual Behavior

Nebula is is currently Python 2.7.x based

Add support for mounting

support for mounts will help create new options for the worker-manager

Create containers multiarch manifest file to allow single container multiarch support

Expected/Wanted Behavior

The same image should be used by both ARMv8 & X64 architectures (and possibly other ARM types as CI/CD for those permits)

Actual Behavior

X64 has it's own image, ARMv8 has it's own image, the latest points to X64 alone & no docker manifest file exists.

add "nebula" user-network by default

after #20 is complete it might be a good idea to add a default "nebula" user network that's basically a bridge network so users can route traffic between containers on the same server via container names, the network should be checked if it exists at the worker-manager boot and be created if not so it could later be used by the apps as any other user network

Allow starting the worker with no conf.json file present

Expected/Wanted Behavior

The worker should work even without any config file existing if it has all the needed params given as envvars.

Actual Behavior

Currently this is worked around by having an empty config file in the root of the repo but a conf.json file is still needed to exist or the start will fail.

How to fix

We need to check if the conf.json exists before reading from it and if it doesn't exist set the auth_file var to be an empty dict.

Rename worker-manager to worker

Expected/Wanted Behavior

Having both api-manager and worker-manager is confusing, the worker-manager should be renamed worker as to better state what it actually does.

This should be in the documentation, the codebase, git repo, docker containers, CI/CD (Docker hub & shippable).

Relates to nebula-orchestrator/manager#21

Facing issue in configuring AWS ECR as a registry using credential helper for Nebula worker.

Hello

I have configured a nebula worker on the Raspberry. I am using AWS ECR as a registry to store the Images. The AWS ECR dynamically updates the auth password every 12 hours. I can't update this password every time at the worker. So I have configured AWS credential helper which automatically updates the auth password every 12 hours on the edge device.

Expected/Wanted Behavior

Whenever I push the update, the worker will pull new image from AWS ECR.

Actual Behavior

It is working perfectly when I add REGISTRY_AUTH_USER and REGISTRY_AUTH_PASSWORD manually every 12 hours. The worker is able to pull the update from the AWS ECR registry.

But now when I have configured the AWS ECR credential helper the nebula work is unable to pull the Image. To test if my AWS ECR credential helper is working properly I tried the command docker pull <my_registry_url>/<image_name> and it worked. Note: I also tried this command after 12 hours when my auth_password became invalid and it still worked.

I have added worker docker-compose.yml and the worker logs for the reference purpose =>

docker-compose.yml =>

version: '3'
services:
  worker:
    container_name: worker
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
    hostname: worker
    environment:
      REGISTRY_HOST: < my_regisrty_url >
      MAX_RESTART_WAIT_IN_SECONDS: 0
      NEBULA_MANAGER_AUTH_USER: nebula
      NEBULA_MANAGER_AUTH_PASSWORD: nebula
      NEBULA_MANAGER_HOST: < my_manager_url >
      NEBULA_MANAGER_PORT: 80
      NEBULA_MANAGER_PROTOCOL: http
      NEBULA_MANAGER_CHECK_IN_TIME: 30
      DEVICE_GROUP: test
      KAFKA_BOOTSTRAP_SERVERS: < my_manager_url >:9092
      KAFKA_TOPIC: nebula-reports

worker logs =>

Creating network "nebula_worker_default" with the default driver
Creating worker ... done
Attaching to worker
worker    | reading config variables
worker    | /usr/local/lib/python3.7/site-packages/parse_it/file/file_reader.py:55: UserWarning: config_folder_location does not exist, only envvars & cli args will be used
worker    |   warnings.warn("config_folder_location does not exist, only envvars & cli args will be used")
worker    | reading config variables
worker    | created a bridge type network named nebula
worker    | no registry user pass combo defined, skipping registry login
worker    | checking nebula manager connection
worker    | nebula manager connection ok
worker    | stopping all preexisting nebula managed app containers in order to ensure a clean slate on boot
worker    | initial start of <my_image> app
worker    | pulling image <my_registry_url>/<my_image>:latest
worker    | <my_registry_url>/<my_image>
worker    | 500 Server Error: Internal Server Error ("Get https://<my_registry_url>/v2/<my_image>/manifests/latest: no basic auth credentials")
worker    | problem pulling image <my_registry_url>/<my_image>:latest

Steps to Reproduce the Problem

I have configured a worker from this repository
I have configured the AWS ECR credential helper from this repository https://github.com/awslabs/amazon-ecr-credential-helper
Note: I have used make docker command with the flag TARGET_GOARCH=arm
I have configured the ~/.docker/config.json as follows =>

{
	"credHelpers": { "<my_registry_url>": "ecr-login" }
}

Add option to run one-off commands

Might require a bit of overhull to have rabbit also create an exchange per app that's not a fanout but it should be possible to have a single container run a one time "exec" command and return the result to the user through Nebula

Add ARM version of the worker-manager

As a lot of IoT devices are ARM based it could be wise to have an ARM version of the worker-manager to allow managing them as well.

Integrate with Dockerfile based healthchecks

Expected/Wanted Behavior

Nebula should integrate with Dockerfile based healthchecks so that if a container is defined as unhealthy it restarts it, it should default to on but have a per app flag that allows disabling that feature

It should be noted that if\when this will become included in the Docker engine it should really be used throughout that, either via Nebula passing Docker engine the needed settings or (hopefully) that becoming Docker engine default and means noting will be needed to be done Nebula side.

Actual Behavior

Currently Nebula\Docker engine only restarts crashed containers & ignores the Docker engine health checks results

pod like stracture option

in addtion to being able to describe the APP_NAME with a list of apps nebula manages on the server a optional (instead of and\or addition to APP_NAME) param of APP_PODS can help ease management, each APP_POD is basically a group of apps, on the worker-manager side there will need to be support ot reading the APP_PODS at startup from mongo, opening a rabbit APP_PODS queue per pod for that server instnace and listening to any changes in apps in the relevant pods and updating the apps to match it

Registry Auth from .docker standard config file

currently the registry auth is coded into the worker-manager via it's config file or ENVVAR, there should also be support to the standard location dockerfile auth located at <home_folder>/.docker/config.json as well

Some Nebula config paramters should be optional/have default

the following params should be optional/have a default value:

RABBIT_HEARTBEAT - default to 3600
RABBIT_VHOST - default to nebula
REGISTRY_HOST - default to docker hub
REGISTRY_AUTH_USER - should be optional for those who uses only public image with no login - requires code change to skip the registry login step if not set
REGISTRY_AUTH_PASSWORD - should be optional for those who uses only public image with no login - requires code change to skip the registry login step if not set
max_restart_wait_in_seconds - default to 0

Support for multiple authenticated registries

Currently Nebula only support one docker registry auth, not yet sure how but support for multiple authenticated registries might be needed in some cases.

Add support for setting container CMD command

currently all apps use the default container CMD command, there should be an ability to optionally change that to something else.

Facing issue in creating reporting kafka connection object

Hello

I have configured the Nebula worker on the Raspberry Pi.

I am using Ubuntu 18.04 VPS on which I have the following containers =>

Nebula Manager

Mongo

Nebula Reporter

kafka

zookeeper

Expected/Wanted Behavior

The worker sends the current state to a Kafka cluster after every sync with the manager. The reporter component will pull from Kafka and populate the state data into the backend DB. Then the manager can query the new state data from the backend DB to let the admin know the state of managed devices.

Actual Behavior

When the nebula worker downloads and updates the application while reporting the state using Kafka I get the following error =>

1. Logs of Nebula worker =>

Recreating 3cc087462a4c_worker ... done
Attaching to worker
worker    | reading config variables
worker    | reading config variables
worker    | /usr/local/lib/python3.7/site-packages/parse_it/file/file_reader.py:55: UserWarning: config_folder_location does not exist, only envvars & cli args will be used
worker    |   warnings.warn("config_folder_location does not exist, only envvars & cli args will be used")
worker    | logging in to registry
worker    | {'IdentityToken': '', 'Status': 'Login Succeeded'}
worker    | checking nebula manager connection
worker    | nebula manager connection ok
worker    | stopping all preexisting nebula managed app containers in order to ensure a clean slate on boot
worker    | stopping container e02f34d03c880a47cc33cb51b5e84578f7e387f305e618843a9c8e229ccd93cb
worker    | removing container e02f34d03c880a47cc33cb51b5e84578f7e387f305e618843a9c8e229ccd93cb
worker    | initial start of example app
worker    | pulling image <my_registry_url>/flask:latest
worker    | <my_registry_url>/flask
worker    | {
worker    |     "status": "Pulling from flask",
worker    |     "id": "latest"
worker    | }
worker    | {
worker    |     "status": "Digest: sha256:6f51939e6d3dff3fdfebdeb639ddad00c3671d5f0b241666c9e140d1bfa7883c"
worker    | }
worker    | {
worker    |     "status": "Status: Image is up to date for <my_registry_url>/flask:latest"
worker    | }
worker    | creating container example-1
worker    | successfully created container example-1
worker    | starting container example-1
worker    | completed initial start of example app
worker    | starting work container health checking thread
worker    | creating reporting kafka connection object
worker    | failed creating reporting kafka connection object - exiting
worker    | NoBrokersAvailable

2. Logs of Nebula Reporter =>

reading config variables
creating reporting kafka connection object
NoBrokersAvailable
failed creating reporting kafka connection object - exiting
reading config variables
creating reporting kafka connection object
NoBrokersAvailable
failed creating reporting kafka connection object - exiting
reading config variables
creating reporting kafka connection object
NoBrokersAvailable
failed creating reporting kafka connection object - exiting
reading config variables
creating reporting kafka connection object
NoBrokersAvailable
failed creating reporting kafka connection object - exiting
reading config variables
creating reporting kafka connection object
NoBrokersAvailable
failed creating reporting kafka connection object - exiting
reading config variables
creating reporting kafka connection object
opened MongoDB connection
starting to digest messages from kafka

Note As Kafka logs are too big I haven't added them but if you need them for debugging I can attach the log file.

Steps to Reproduce the Problem

Configured worker on Raspberry Pi using docker-compose.yml and docker custom built mentioned in the Specifications section
Configured Manager, Reporter, Mongo, Kafka, and zookeeper on Ubuntu 18.04 using docker-compose.yml as mentioned in the Specification section.
Configured a Private Docker Registry for maintaining the Update release and Images.

Specifications

1. Nebula worker =>

At the worker side as I am using Raspberry Pi I had to build the Image on the Pi and start the container. For achieving this I did the following steps =>

Dir Structure =>

- Nebula worker 
       - Dockerfile
       - docker-compose.yml
       - worker/ ( Directory where all the source code is there )

Docker file =>

# it's official so I'm using it + alpine so damn small
FROM python:3.7.2-alpine3.9

# copy the codebase
COPY . /worker

# install required packages - requires build-base due to psutil GCC complier requirements
RUN apk add --no-cache build-base python3-dev linux-headers
RUN pip install -r /worker/worker/requirements.txt

#set python to be unbuffered
ENV PYTHONUNBUFFERED=1

# run the worker-manger
WORKDIR /worker
CMD [ "python", "worker/worker.py" ]

docker-compose.yml for worker =>

version: '3'
services:
  worker:
    container_name: worker
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
    hostname: worker
    environment:
      REGISTRY_HOST: <my_registry_url>
      REGISTRY_AUTH_USER: <my_registry_user>
      REGISTRY_AUTH_PASSWORD: <my_registry_password>
      MAX_RESTART_WAIT_IN_SECONDS: 0
      NEBULA_MANAGER_AUTH_USER: nebula
      NEBULA_MANAGER_AUTH_PASSWORD: nebula
      NEBULA_MANAGER_HOST: <my_vps_url>
      NEBULA_MANAGER_PORT: 80
      NEBULA_MANAGER_PROTOCOL: http
      NEBULA_MANAGER_CHECK_IN_TIME: 5
      DEVICE_GROUP: example
      KAFKA_BOOTSTRAP_SERVERS: <my_vps_url>:9092
      KAFKA_TOPIC: nebula-reports

2. Nebula Manager, Mongo, Kafka, Reporter and Zookeeper =>

docker-compose.yml =>

version: '3'
services:
  mongo:
    container_name: mongo
    hostname: mongo
    image: mongo:4.0.1
    ports:
      - "27017:27017"
    restart: unless-stopped
    environment:
      MONGO_INITDB_ROOT_USERNAME: nebula
      MONGO_INITDB_ROOT_PASSWORD: nebula

  manager:
    container_name: manager
    hostname: manager
    depends_on:
      - mongo
    image: nebulaorchestrator/manager
    ports:
      - "80:80"
    restart: unless-stopped
    environment:
      MONGO_URL: mongodb://nebula:nebula@mongo:27017/nebula?authSource=admin
      SCHEMA_NAME: nebula
      BASIC_AUTH_PASSWORD: nebula
      BASIC_AUTH_USER: nebula
      AUTH_TOKEN: nebula

  zookeeper:
    container_name: zookeeper
    hostname: zookeeper
    image: zookeeper:3.4.13
    ports:
      - 2181:2181
    restart: unless-stopped
    environment:
      ZOO_MY_ID: 1

  kafka:
    container_name: kafka
    hostname: kafka
    image: confluentinc/cp-kafka:5.1.2
    ports:
      - 9092:9092
    restart: unless-stopped
    depends_on:
      - zookeeper
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  reporter:
    container_name: reporter
    hostname: reporter
    depends_on:
      - mongo
      - kafka
    image: nebulaorchestrator/reporter
    restart: unless-stopped
    environment:
      MONGO_URL: mongodb://nebula:nebula@mongo:27017/nebula?authSource=admin
      SCHEMA_NAME: nebula
      BASIC_AUTH_PASSWORD: nebula
      BASIC_AUTH_USER: nebula
      KAFKA_BOOTSTRAP_SERVERS: kafka:9092
      KAFKA_TOPIC: nebula-reports

Auto match version to branch on deployment and have it part of the report generated for the optional reporting system

Expected/Wanted Behaviour

The version of the worker should be auto set to match the branches.

the $TRAVIS_BRANCH envvar will likely be part of the solution but as this should be a change in the codebase itself (rather then just assigning the variable value a more complex solution the just os.getenv(TRAVIS_BRANCH) will be needed.

after there is a version as part of the codebase it should be added to be part of the reports that are sent to the optional reporting system.

Possible solutions include:

Related to #41

Actual Behaviour

Version is manually set before deployments.

How to check if edge device is updated successfully?

I have configured Nebula Worker on the Raspberry Pi and Mongo, Nebula Manager, Reporter, Kafka and Zookeeper on the VPS ( Which is a Ubuntu 18.04 machine )

Expected/Wanted Behavior

Status if the Remote device is updated or the update failed using an API call.

Actual Behavior

I am referring this Nebula Documentation https://nebula.readthedocs.io/en/latest/api/general/

I have tried the List a filtered paginated view of the optional reports system section from this documentation.

I get the following information when I try this API http://<my_vps_url>/api/v2/reports?page_size=1& =>

{
    "data": [
        {
            "_id": {
                "$oid": "5d20785900bb37cdd5352c5c"
            },
            "memory_usage": {
                "total": 926,
                "used": 159,
                "free": 91,
                "available": 680
            },
            "root_disk_usage": {
                "total": 14890,
                "used": 2140,
                "free": 12115
            },
            "cpu_usage": {
                "cores": 4,
                "used_percent": 0.6
            },
            "cron_jobs_containers": [],
            "apps_containers": [
                {
                    "read": "0001-01-01T00:00:00Z",
                    "preread": "0001-01-01T00:00:00Z",
                    "pids_stats": {},
                    "blkio_stats": {
                        "io_service_bytes_recursive": null,
                        "io_serviced_recursive": null,
                        "io_queue_recursive": null,
                        "io_service_time_recursive": null,
                        "io_wait_time_recursive": null,
                        "io_merged_recursive": null,
                        "io_time_recursive": null,
                        "sectors_recursive": null
                    },
                    "num_procs": 0,
                    "storage_stats": {},
                    "cpu_stats": {
                        "cpu_usage": {
                            "total_usage": 0,
                            "usage_in_kernelmode": 0,
                            "usage_in_usermode": 0
                        },
                        "throttling_data": {
                            "periods": 0,
                            "throttled_periods": 0,
                            "throttled_time": 0
                        }
                    },
                    "precpu_stats": {
                        "cpu_usage": {
                            "total_usage": 0,
                            "usage_in_kernelmode": 0,
                            "usage_in_usermode": 0
                        },
                        "throttling_data": {
                            "periods": 0,
                            "throttled_periods": 0,
                            "throttled_time": 0
                        }
                    },
                    "memory_stats": {},
                    "name": "/example-1",
                    "id": "dafc6f075726d61a6b2bc3feffe0cecb738bd43d04eca89c6f3fa72dd9d50193"
                }
            ],
            "current_device_group_config": {
                "status_code": 200,
                "reply": {
                    "apps": [
                        {
                            "app_id": 1,
                            "app_name": "example",
                            "starting_ports": [
                                8080
                            ],
                            "containers_per": {
                                "server": 1
                            },
                            "env_vars": {},
                            "docker_image": "<my_registry_url>/flask",
                            "running": true,
                            "networks": [
                                "nebula"
                            ],
                            "volumes": [
                                "/tmp:/tmp/1",
                                "/var/tmp/:/var/tmp/1:ro"
                            ],
                            "devices": [],
                            "privileged": false,
                            "rolling_restart": false
                        }
                    ],
                    "apps_list": [
                        "example"
                    ],
                    "prune_id": 1,
                    "cron_jobs": [],
                    "cron_jobs_list": [],
                    "device_group_id": 1
                }
            },
            "device_group": "example",
            "report_creation_time": 1562409049,
            "hostname": "worker",
            "report_insert_date": {
                "$date": 1562409049716
            }
        }
    ],
    "last_id": {
        "$oid": "5d20785900bb37cdd5352c5c"
    }
}

I am unable to find which key from the above API can help me if the device is updated or failed or is there another API for finding this ( I am unable to find any other API for this purpose. )

I also checked the database I got the following results =>

# mongo
> use nebula
switched to db nebula
> show collections
nebula_apps
nebula_cron_jobs
nebula_device_groups
nebula_reports
nebula_user_groups
nebula_users

I have checked the nebula_reports collection I got the same output What I got with the above API call.

What am I doing wrong here?

Facing issue in running worker on remote device.

Hello

I have a Raspberry Pi ( This is my edge device where I have configured a Worker ) and a Server ( Where I have a docker registry, Nebula Manager, MongoDB. )

Expected/Wanted Behavior

The image will be downloaded at the Edge device from the Remote Registry once a new image gets available and a container will start.

Actual Behavior

Facing an issue when starting the container on the Edge Device.

creating container example-1
successfully created container example-1
starting container example-1
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.39/containers/example-1/start
completed initial start of example app

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 168, in start_container
    return self.cli.start(container_name)
  File "/usr/local/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/docker/api/container.py", line 1093, in start
    self._raise_for_status(res)
  File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.NotFound: 404 Client Error: Not Found ("network nebula not found")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 271, in run_container
    self.start_container(container_name)
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 169, in start_container
    except "APIError" as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

Steps to Reproduce the Problem

Started the Registry, Manager and Mongo Db on my Server.
Configured Worker on Raspberry Pi
Pushed new image on the Registry, Notified Manager of new Update at Registry using the device API.

Specifications

Raspberry Pi ArmV7 arch.

Docker-compose for Manager and MongoDB

version: '3'
services:
  mongo:
    container_name: mongo
    hostname: mongo
    image: mongo:4.0.1
    ports:
      - "27017:27017"
    restart: unless-stopped
    environment:
      MONGO_INITDB_ROOT_USERNAME: <my-password>
      MONGO_INITDB_ROOT_PASSWORD: <my-password>

  manager:
    container_name: manager
    hostname: manager
    depends_on:
      - mongo
    image: nebulaorchestrator/manager
    ports:
      - "80:80"
    restart: unless-stopped
    environment:
      MONGO_URL: mongodb://nebula:nebula@mongo:27017/nebula?authSource=admin
      SCHEMA_NAME: nebula
      BASIC_AUTH_PASSWORD: <my-password>
      BASIC_AUTH_USER: <my-password>
      AUTH_TOKEN: <my-password>

For Raspberry Pi, I have cloned the worker Repo from GitHub and I am using this to build the Worker.

My Dir Structure =>

- Nebula worker 
       - Dockerfile
       - docker-compose.yml
       - worker/ ( Directory where all the source code is there )

Docker file =>

# it's offical so i'm using it + alpine so damn small
FROM python:3.7.2-alpine3.9

# copy the codebase
COPY . /worker

# install required packages - requires build-base due to psutil GCC complier requirements
RUN apk add --no-cache build-base python3-dev linux-headers
RUN pip install -r /worker/worker/requirements.txt

#set python to be unbuffered
ENV PYTHONUNBUFFERED=1

# run the worker-manger
WORKDIR /worker
CMD [ "python", "worker/worker.py" ]

docker-compose.yml

version: '3'
services:
  worker:
    container_name: worker
    image: nebulaorchestrator/worker-arm64v8:latest
    #build:
      #context: .
      #dockerfile: Dockerfile
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
    hostname: worker
    environment:
      REGISTRY_HOST: < My_Registry_URL >
      REGISTRY_AUTH_USER: < Registry_User >
      REGISTRY_AUTH_PASSWORD:  < Registry_Password >
      MAX_RESTART_WAIT_IN_SECONDS: 0
      NEBULA_MANAGER_AUTH_USER: <my-password>
      NEBULA_MANAGER_AUTH_PASSWORD: <my-password>
      NEBULA_MANAGER_HOST: < Manager-URL >
      NEBULA_MANAGER_PORT: 80
      NEBULA_MANAGER_PROTOCOL: http
      NEBULA_MANAGER_CHECK_IN_TIME: 5
      DEVICE_GROUP: example
      #KAFKA_BOOTSTRAP_SERVERS: kafka:9092
      #KAFKA_TOPIC: nebula-reports

Note To check if my configurations are right I configured Nebula Manager, MongoDb and nebula worker on my local machine and tested if everything is working as per the expected Behaviour. It is working properly in that case. But in case of Raspberry Pi I am facing the above mentioned issue

RabbitMQ connections are not closed properly

Expected/Wanted Behavior

RabbitMQ connections should be closed whenever it's usage ends and it's attached channel is closed

Actual Behavior

Channel is closed but the connection is kept until it times out

How to fix

The rabbit_login function should also return the "rabbit_connection" & not just the "rabbit_connection_channel" and whenever the channel is closed explicitly it needs to close the connection as well.

Move automatic Docker imags build from Docker Hub to Travis-CI

Expected/Wanted Behavior

After Travis CI runs the unit tests successfully it should build the Docker image & push it to Docker hub.

If branch is master the image tag should be latest else it should be the same as the branch name (can look at TRAVIS_BRANCH envvar in Travis CI to get branch name being built)

Docker Hub should not build the images anymore as it ignores the unit tests failures.

Also the Docker Hub build tag should be removed from the README.md file

Actual Behavior

Docker hub builds the images while Travis CI runs the unit tests, each ignoring what the results of the other is.

Registry login should only happen once at start

as the registry login is kept for the life of the docker_socket connection is alive there really no need to keep re-login to the registry every time the worker-manager access it, it should happen only once after the docker_socket is created, this will simplify things, avoid unneeded API calls to the registry, and allow to removal of a lot of registry user,pass,host variables from a bunch of modules.

move from low-level docker API to the normal one

should make everything much less ugly & easier to maintain

Consider moving ARM CI/CD build to drone.io

Expected/Wanted Behavior

drone.io is CI/CD with Docker in mind, this meshes well with Nebula, the ARM step should first be moved to be built on it & depanding on the results maybe the x64 build as well (as Travis-CI is also a great tool).

Actual Behavior

Builds should work\fail on their own merit & not on the build system idiosyncrasies, the current shippable based system have a open bug which affects multiple builds on & off without any attention from shippable support team to being acknowledged, much less to work on resolving.

Steps to Reproduce the Problem

Run a build on shippable ARM
Wait...
about 50% of the times it will timeout do to the bug described above.

get rolling restart rolling

the rolling restart function is currently just a place holder, this really needs fixing so users will have that option as well as a hard restart of all containers

MongoDB should only disconnect after getting the required data for all needed apps on boot

Expected Behavior

Upon boot of worker-manager MongoDB should get the data for all the apps said worker is configured to manage then disconnect

Actual Behavior

worker-manager uses diffrent threads per app to connect to MongoDB which results in multiple simoltuinus connections to Mongo rather then just 1 (1 per app the worker manages)

Steps to Reproduce the Problem

start the worker manager with more then 1 APP_NAME envvar (app1,app2,...,appN)
watch it reconnect & disconnect N times to Mongo

add support for user networks

as mentioned in #19 having user network options might be a good idea to allow inner-pod communication between containers be done with the container hostname DNS resolution.

there are 2 options to go about it I'm still undecided will be better:

ensure a default Nebula user network exists (create it if needed) at worker-manager boot and connect all of the containers managed by nebula to it.
add the option of having users create & attach their own user networks to Nebula apps.

It's really a question of customizability vs sane defaults, thoughts?

Self update worker container on deployed remote devices

Expected/Wanted Behavior

Allow worker container to update to newer version deployed to remote devices

Actual Behavior

Not able to update worker container

Intial worker sync should be done via RabbitMQ & not via a direct connection to MongoDB

Expected/Wanted Behavior

Upon starting a new worker it should request Via RabbitMQ the newest app config - this should be done via a queue where all the api-managers listens to and reply in a new thread.

Actual Behavior

Currently upon starting a new worker it connects directly to MongoDB & get the current app config from it - this is not ideal as it requires a read only MongoDB connection from every worker reachable for the intial sync

Remove unneeded modules from requirements.txt file

Currently the requirements.txt is a mess, each repo should only include the requirements it actaully needs to function and not have grabage that was either once needed but not anymore or needed by another repo in the Nebula project but not this repo.

Add SSL support for RabbitMQ connections

because security is important

host network leaves container with no network

Expected/Wanted Behavior

container on host network

Actual Behavior

no network set

Steps to Reproduce the Problem

used the following config

        {
          "starting_ports": [],
          "containers_per": {"server": 1},
          "env_vars": {"ENV": "dev"},
          "docker_image" : "mine/sensu-client",
          "running": true,
          "networks": ["host"],
          "privileged": true,
          "devices": [],
          "volumes": []
        }

but inside the container ifconfig just showed lo nic.

Add the option to not automatically --rm docker images and auto pull on every nebula app change

not automatically removing images & repulling them might have some uses in cases where you want to reuse local based images, when this is added there should also be a way to force\order GC older images that are unused (one of the original reason why currently Nebula automatically deletes all images so aggressively) otherwise I can imagine IoT devices will get filled with old images quickly.

Update prereqs

Expected/Wanted Behavior

All prereqs at requirements.txt needs to up to date

Actual Behavior

Some packages are outdated

Add support for running an app as privileged

some apps (like log aggregations) might require running as a privileged containers, support for that will help

have worker have the option to connect to the managers with a UUID token instead of basic auth

Expected/Wanted Behavior

Relating to nebula-orchestrator/manager#2 the worker should be changed to allow to connect to the manager with a Bearer UUID token as another option in addition to using basic auth user\pass.

Actual Behavior

The worker connect to the manager using basic auth user\pass as the only option.

communication between containers across clusters

How is the networking handled when the containers are distributed across host/regions?
say you have an app consisting of two microservices that need to communicate. how is this handled? like subnet or ip assignment..

Feature Request: Get Update status at the reporter from worker only when update is performed ( failed or successful ).

The issue is opened with the Ref #63

Expected/Wanted Behavior

I want the complete device update history and store only those report which contains the status of updates. ( i.e. Fail or Success ) This data can be purged after 11 or 12 months if required.

The reason for this type of mechanism is to avoid the large volume of data getting accumulated in the MongoDB.

To elaborate the above two points I want the current behavior where I get the device state ( for example if container is running, ram, CPU, etc ) continuously according to the NEBULA_MANAGER_CHECK_IN_TIME which I can purge after some time ( for example six months ) but for another behavior I want the update report ( in this I am expecting when was device was updated and which release it has ) only if the end device is updated successfully or the update failed. I want to maintain the data for second behavior for more time comparatively than the first behavior.

Actual Behavior

The worker is continuously sending the data to the reporter as per the time defined at NEBULA_MANAGER_CHECK_IN_TIME .

Problem due to Actual Behavior

Large Volume of data gets accumulated which makes it difficult to maintain the data for 11 to 12 months. Also, it becomes difficult to keep track of the updates for multiple devices with such a large volume of data.

fix creating new branch from last push not starting travis run due to auto added changelog having the [skip travis] flag on the commit message

Expected/Wanted Behavior

create new branch -> trigger travis build -> new version of branch deployed

Actual Behavior

create new branch -> [skip travis] includded so it doesn't trigger travis build -> new version of branch not deployed -> enter the new branch -> make a push in the new branch -> travis now runs

Refactor to use newest version of docker-py

required for some of the newer features of Docker

Version lock all required pip dependencies in the Dockerfile

The Dockerfile should have all the required pip modules dependencies version locked, this avoids having containers build down the line and failing do to updated dependencies breaking changes.

add support for docker devices (equivilent to docker run --device)

adding support for devices to be used from inside containers will allow simpler usability for externally mounted devices (such as USB devices) which would help in easing IOT implementations

add multiple DB backend-support - MySQL/MariaDB

MariaDB Galera replication is a great fit for Nebula huge reads\low writes, it should be added as another option alongside MongoDB

Inherit environment variables from the worker

I can see some cases that inheriting envvars from the worker-manager host node will be a good idea, that would allow devices to have some customizability that might be useful for distributed systems yet still managed centrally (same IoT sensor everywhere, but a tag is manually set on each sensor with it's location name, etc...), thinking it could be any combination of the follow:

envvar with some sort of prefix (IE NEBULA_)
file with a list of the envvars

not sure about how to handle multiple apps each getting different envvars yet, anyone got any ideas in that regards?

Add Docker plugin support

support for Docker storage & network plugin will allow using Docker to it's full capabilities.

Add cron jobs management support

Expected/Wanted Behavior

Following nebula-orchestrator/manager#29 the worker will need to be changed to allow supporting the new cron_jobs option, this will need a few changes:

Reading the device_group /info page will need to also read and manage the local cron_jobs configuration similarly to how it manages Nebula apps local configuration right now
The current sleep logic will need to be changed so that it starts the cron_jobs containers and only moves on to reread the Nebula manager device_group /info after the "nebula_manager_check_in_time" as elapsed rather then sleep the entire "nebula_manager_check_in_time" in one go

(Thinking using https://pypi.org/project/croniter/ for the cron parsing to datetime and from there the logic is very simple)

Actual Behavior

The current workarounds is to either have a cron service managed as a Nebula app that will in turn start containers based on it's cron definitions or have all tasks that need to run based on a schedule each be it's on Nebula app and to have an internal logic in them that waits for the right time to run.

nebula-orchestrator / worker Goto Github PK

worker's Issues

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

How to fix

Expected/Wanted Behavior

Expected/Wanted Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

1. Logs of Nebula worker =>

2. Logs of Nebula Reporter =>

Steps to Reproduce the Problem

Specifications

1. Nebula worker =>

2. Nebula Manager, Mongo, Kafka, Reporter and Zookeeper =>

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Expected/Wanted Behavior

Actual Behavior

How to fix

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Steps to Reproduce the Problem

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Problem due to Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Expected/Wanted Behavior

Actual Behavior

Recommend Projects

Recommend Topics

Recommend Org