Git Product home page Git Product logo

Comments (6)

issue-label-bot avatar issue-label-bot commented on May 23, 2024 1

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.58. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

from worker.

naorlivne avatar naorlivne commented on May 23, 2024

Hi @Sharvin26

I first want to confirm I understand the issue correctly.

  • Your DB & manager works correctly
  • You have created an example device group & app
  • When you run a worker on a linux x64 based host everything works
  • When you run a worker on a Raspberry Pi the worker itself starts but you get the above error when it tries running the example app as part of the example device group it's connected to
  • You're using a customized worker your building yourself on the Raspberry Pi because (I assume) it's an ARMv7

If I got anything wrong let me know otherwise I think it's safe to assume that the issue is something with the ARMv7 implementation (as you mention it works on a x64 linux host with the same config) so I have a few things that come to mind that might be the cause:

  1. Silly question but are you running docker-compose without root permissions? please try running both with sudo and as the root user and\or grant the user your running it as permissions to the docker engine.

  2. docker.errors.NotFound: 404 Client Error: Not Found ("network nebula not found") on the logs leads me to believe that either the nebula network was never created on the worker or was deleted at some point, both cases are very weird as the nebula worker will ensure that default network is always created as part of it's boot process so can you provide with the logs of the worker boot as well rather then just the part where the issue is?

  3. Can you run docker network ls on the worker Pi and share the results? I want to see if it created the nebula network & just have issues connecting to it or if it failed to create it at all

  4. Can you try running the worker with docker run (no docker-compose) to test? if it works that way we know that the issue is how it interect with docker-compose, running a google search on requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.39/containers/example-1/start brought me up a lot of past tickets that revolved around docker-compose & root permissions so it's worth checking to confirm, also docker-compose creating it's own networks might be somehow related.

P.S. In the docker-compose.yml you use an image named nebulaorchestrator/worker-arm64v8:latest I think you meant it to be arm64v7?

from worker.

Sharvin26 avatar Sharvin26 commented on May 23, 2024

Hello @naorlivne

Thanks for the Response.

  • Your DB & manager works correctly

Yes DB & Manager are working correctly.

  • You have created an example device group & app

Yes, I have created an example device group and app.

  • When you run a worker on a linux x64 based host everything works

When running x64 based host everything works correctly.

  • When you run a worker on a Raspberry Pi the worker itself starts but you get the above error when it tries running the example app as part of the example device group it's connected to

The Nebula worker has no issue in starting on Raspberry Pi. But when a new Image is pushed at the Docker registry and Manager is notified of that then Worker Pulls the Image from the Registry and when it starts the Container then I get the Above mentioned Issue.

  • You're using a customized worker your building yourself on the Raspberry Pi because (I assume) it's an ARMv7

Yes, I have cloned the Image from this repository and then made changes in the directory structure Which I have mentioned in my First comment.

Note: In my first comment, the docker-compose I mentioned for the worker has one mistake that I have changed here. I am not pulling the image from docker hub I am building the image by cloning the source code from this repo. When Pulling the image from the docker hub I am getting this error

Pulling worker (nebulaorchestrator/worker:arm64v7)... 
ERROR: manifest for nebulaorchestrator/worker:arm64v7 not found

and for arm64v8 I get this error =>

Pulling worker (nebulaorchestrator/worker:arm64v8)...
arm64v8: Pulling from nebulaorchestrator/worker
--- Downloading and Extracting ---
Digest: sha256:0f37da08ec05f420a3cc286bef716f98e99442e392e171bd4bdb2848161240da
Status: Downloaded newer image for nebulaorchestrator/worker:arm64v8
Creating worker ... done
Attaching to worker
worker    | standard_init_linux.go:211: exec user process caused "exec format error"

I tried both the Options and they are not working When I searched for this issue standard_init_linux.go:211 I found this is an architecture related issue. So I am building it on the Raspberry Pi by cloning this repo.

version: '3'
services:
  worker:
    container_name: worker
    #image: nebulaorchestrator/worker-arm64v8:latest
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
    hostname: worker
    environment:
      REGISTRY_HOST: < My_Registry_URL >
      REGISTRY_AUTH_USER: < Registry_User >
      REGISTRY_AUTH_PASSWORD:  < Registry_Password >
      MAX_RESTART_WAIT_IN_SECONDS: 0
      NEBULA_MANAGER_AUTH_USER: <my-password>
      NEBULA_MANAGER_AUTH_PASSWORD: <my-password>
      NEBULA_MANAGER_HOST: < Manager-URL >
      NEBULA_MANAGER_PORT: 80
      NEBULA_MANAGER_PROTOCOL: http
      NEBULA_MANAGER_CHECK_IN_TIME: 5
      DEVICE_GROUP: example
      #KAFKA_BOOTSTRAP_SERVERS: kafka:9092
      #KAFKA_TOPIC: nebula-reports

Results for the Steps that you advised me to perform in the above comment =>

  1. Yes, I am running the docker as a sudo and as the root user.

  2. Booting Result of Nebula worker

=> docker-compose up
Creating worker ... done
Attaching to worker
worker    | reading config variables
worker    | /usr/local/lib/python3.7/site-packages/parse_it/file/file_reader.py:55: UserWarning: config_folder_location does not exist, only envvars & cli args will be used
worker    |   warnings.warn("config_folder_location does not exist, only envvars & cli args will be used")
worker    | reading config variables
worker    | logging in to registry
worker    | {'IdentityToken': '', 'Status': 'Login Succeeded'}
worker    | checking nebula manager connection
worker    | nebula manager connection ok
worker    | stopping all preexisting nebula managed app containers in order to ensure a clean slate on boot
worker    | initial start of example app
worker    | pulling image <my_registry_url>/ubuntu:latest # Note Here My registry url get's print But I have changed it to my_registry_url as a example
worker    | <my_registry_url>/ubuntu 
worker    | {
worker    |     "status": "Pulling from ubuntu",
worker    |     "id": "latest"
worker    | }
worker    | {
worker    |     "status": "Digest: sha256:8ee703cfd6d7d4d2c69971989bd4d20221ff7f0e7fa459c4de14e814394757b0"
worker    | }
worker    | {
worker    |     "status": "Status: Image is up to date for <my_registry_url>/ubuntu:latest"
worker    | }
worker    | creating container example-1
worker    | successfully created container example-1
worker    | starting container example-1
worker    | Exception in thread Thread-1:
worker    | Traceback (most recent call last):
worker    |   File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
worker    |     response.raise_for_status()
worker    |   File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
worker    |     raise HTTPError(http_error_msg, response=self)
worker    | requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.39/containers/example-1/start
worker    |
worker    | During handling of the above exception, another exception occurred:
worker    |
worker    | Traceback (most recent call last):
worker    |   File "/worker/worker/functions/docker_engine/docker_engine.py", line 168, in start_container
worker    |     return self.cli.start(container_name)
worker    |   File "/usr/local/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
worker    |     return f(self, resource_id, *args, **kwargs)
worker    |   File "/usr/local/lib/python3.7/site-packages/docker/api/container.py", line 1093, in start
worker    |     self._raise_for_status(res)
worker    |   File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 263, in _raise_for_status
worker    |     raise create_api_error_from_http_exception(e)
worker    |   File "/usr/local/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
worker    |     raise cls(e, response=response, explanation=explanation)
worker    | docker.errors.NotFound: 404 Client Error: Not Found ("network nebula not found")
worker    |
worker    | During handling of the above exception, another exception occurred:
worker    |
worker    | Traceback (most recent call last):
worker    |   File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
worker    |     self.run()
worker    |   File "/usr/local/lib/python3.7/threading.py", line 865, in run
worker    |     self._target(*self._args, **self._kwargs)
worker    |   File "/worker/worker/functions/docker_engine/docker_engine.py", line 271, in run_container
worker    |     self.start_container(container_name)
worker    |   File "/worker/worker/functions/docker_engine/docker_engine.py", line 169, in start_container
worker    |     except "APIError" as e:
worker    | TypeError: catching classes that do not inherit from BaseException is not allowed
worker    |
worker    | completed initial start of example app
worker    | starting work container health checking thread
worker    | starting device_group example /info check loop, configured to check for changes every 5 seconds
  1. Results for docker network ls command
=> docker network ls
NETWORK ID          NAME                    DRIVER              SCOPE
f259bbd96621        bridge                  bridge              local
2d1d68f8ba8f        host                    host                local
408f63c676f6        nebula_default          bridge              local
1138713daa73        nebula_worker_default   bridge              local
354b0e702495        none                    null                local
  1. Running the Image with docker run command
=> docker build -t nebula-worker .
=> docker run --restart=always -e DEVICE_GROUP="example" -e REGISTRY_HOST="<my_registry_url>" -e REGISTRY_AUTH_USER="<my_registry_user>" -e REGISTRY_AUTH_PASSWORD="<my_registry_password>" -e NEBULA_MANAGER_AUTH_USER="<nebula_user>" -e NEBULA_MANAGER_AUTH_PASSWORD="<nebula_password>" -e NEBULA_MANAGER_HOST="<my_nebula_url>" --name nebula-worker -v /var/run/docker.sock:/var/run/docker.sock nebula-worker

reading config variables
/usr/local/lib/python3.7/site-packages/parse_it/file/file_reader.py:55: UserWarning: config_folder_location does not exist, only envvars & cli args will be used
  warnings.warn("config_folder_location does not exist, only envvars & cli args will be used")
reading config variables
logging in to registry
{'IdentityToken': '', 'Status': 'Login Succeeded'}
checking nebula manager connection
nebula manager connection ok
stopping all preexisting nebula managed app containers in order to ensure a clean slate on boot
initial start of example app
pulling image <my_registry_url>/ubuntu:latest
<my_registry_url>/ubuntu
{
    "status": "Pulling from ubuntu",
    "id": "latest"
}
{
    "status": "Pulling fs layer",
    "progressDetail": {},
    "id": "890bdf70a444"
}
{
    "status": "Pull complete",
    "progressDetail": {},
    "id": "42962dab4cbd"
}
--- Downloading and Extracting the Image
{
    "status": "Digest: sha256:8ee703cfd6d7d4d2c69971989bd4d20221ff7f0e7fa459c4de14e814394757b0"
}
{
    "status": "Status: Downloaded newer image for <my_registry_url>/ubuntu:latest"
}
creating container example-1
successfully created container example-1
starting container example-1
completed initial start of example app
starting work container health checking thread
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.39/containers/example-1/start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 168, in start_container
    return self.cli.start(container_name)
  File "/usr/local/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/docker/api/container.py", line 1093, in start
    self._raise_for_status(res)
  File "/usr/local/lib/python3.7/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.NotFound: 404 Client Error: Not Found ("network nebula not found")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 271, in run_container
    self.start_container(container_name)
  File "/worker/worker/functions/docker_engine/docker_engine.py", line 169, in start_container
    except "APIError" as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

Result for docker version and docker-compose version =>

# docker --version
Docker version 18.09.0, build 4d60db4
# docker-compose --version
docker-compose version 1.24.1, build 4667896

Note: Both docker and docker-compose are installed using docker documentation defined method.

from worker.

naorlivne avatar naorlivne commented on May 23, 2024

Good news, I've found the root cause.

Nebula has a default network creativity named nebula - upon the worker boot it checks if this network exists & if not it creates it, or at least this is how it should be but apparently there's a bug in the network check code that returns true even if the nebula network doesn't exist but another network name starts with "nebula" (basically all nebula* wildcard).

Because your docker-compose run file has created a network named nebula_worker_default the checks (wrongly) returns that the nebula network exists so it doesn't try to create it but then when it gets time to actually use it (by running a container attached to it) it fails.

I'll push a fix in the next few hours to the worker master branch (& by extension to the next numbered version) but if you don't feel like waiting just create a bridge network named nebula on your Pi until then.

from worker.

naorlivne avatar naorlivne commented on May 23, 2024

Fixed push to master, can you do the following:

  1. Pull latest codebase
  2. Rebuild your image
  3. Remove the manually created nebula network on your Pi (I want to confirm Nebula is able to create it on it's own)
  4. Try rerunning the docker-compose based worker on your PI
  5. Confirm everything works & close this ticket

As for the optional reporter system it was added in 2.2.0 & the documentation your looking is of a rather old version 1.5.0 so that's why you can't find anything on it, please look at https://nebula.readthedocs.io/en/latest/ for the latest document version to read more about it.

If you have any more issues about the optional reporting or need an hand with it please open another ticket about it, trying to keep things orderly.

from worker.

Sharvin26 avatar Sharvin26 commented on May 23, 2024

Thanks, it's working now.

from worker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.